My top n tips for python coding in Optimisation

Always write code for readability first. Pre-optimisation is the root of a lot of evil and often makes the code more difficult to improve later. see http://chrisarndt.de/talks/rupy/2008/output/slides.html for what is pythonic The python language is really been written to make the clearest code the most efficient see Ted's example where
>>> for key in my_dict:
looks better and is more efficient then>>> for key in my_dict.keys():
which is inefficient as it creates a new list instead of returning an iterator
If your code is slow use a profiler to find out why before you make changes. Either use the logging module and print timestamps or use the cProfile module. I can't tell you the number of times I have assumed the slow code was for one reason and then found it was another. Also http://code.google.com/p/jrfonseca/wiki/Gprof2Dot is an excellent tool for making pretty graphs
Rigorously validate any optimisation. If your changes don't speed up the code revert them, this is a sliding scale for readability as above, so if you change improves readability but does not change the execution time leave it in. If your change makes the code impossible to read and understand, use lots of comments and only keep it if the speed increase is more than 30% or one minute.
(actual python hints start here) Too much filtering I have seen this a number of times when a using list comprehensions the programmer filters the same list multiple times. For instance in a machine scheduling problem (written in pulp, product_mapping is a list of machines a product can be made on and allocate is a dictionary of variables allocating products to machines)
for m in machines: prob += lpSum(allocate[p, m] for p in products if m in product_mapping[p]) <= 10
If there are a large number of products it is much better to iterate through that list once and compile a mapping of machines to products
machine_mapping = {} for p, m_list in product_mapping.iteritems(): for m in m_list: products = machine_mapping.setdefault(m, []) products.append(p) for m in machines: prob += lpSum(allocate[p, m] for p in machine_mapping[m]) <=10
For large lists try to use generator expressions instead of list comprehensions, or in other words return iterators instead of full lists. In Ted's example dict.keys() returns a new list that if the dictionary is big can hurt performance while dict.iterkeys() will return an iterator and will not have to build a new list each time. Note
>>> for key in my_dict:
and>>> for key in my_dict.iterkeys():
are equivalentexample 2 (summing odd numbers less then 1000) Bad
>>> total = 0 >>> for i in [j for j in range(1000) if j % 2 == 0]: ... total += i
Better (generator expression)>>> total = 0 >>> for i in (j for j in range(1000) if j % 2 == 0): ... total += i
Even Better (xrange instead of range)>>> total = 0 >>> for i in (j for j in xrange(1000) if j % 2 == 0): ... total += i
Best>>> total = sum(j for j in xrange(1000) if j % 2 == 0)
This is only an issue with large lists n >= 1000000 in this case
>>> from timeit import timeit >>> timeit('sum([j for j in range(1000000) if j % 2 == 0])', number=100) 27.90494394302368 >>> timeit('sum(j for j in range(1000000) if j % 2 == 0)', number=100) 13.040030002593994 >>> timeit('sum([j for j in xrange(1000000) if j % 2 == 0])', number=100) 15.114178895950317 >>> timeit('sum(j for j in xrange(1000000) if j % 2 == 0)', number=100) 10.272619009017944
Reader Comments (1)
Hi there,
Found your site through the douglas jiujitsu site. Can you post an email address on your webpage or a contacts tab? Wondering how much bjj training is? I'm keen to come by and train