Entries in python (3)

Friday
Aug052011

Finding python memory leaks with objgraph

I have had a rogue memory leak in one of my programs for a while, but I have now been able to track it down.

It hasn't been an issue until recently when I have been trying to solve a large number of problems. I did some googling and came up with objgraph a module that lets you graphically explore the objects that remain in python's memory.

As python is a garbage collected language memory leaks tend to be caused by one of these reasons

  • Accidentally adding a reference to objects to something in the global scope so they are never garbage collected
  • Circular references that contain an object with a custom __del__() method
  • Memory leakage in a C extension module
  • some other reasons that I have not encounted :-)

To diagnose the first two object graph provides a nice interface.

To install objgraph for interactive use in Ubuntu

$ sudo apt-get python-pygraphviz
$ sudo pip install xdot
$ sudo pip install objgraph

Here is my example using the pulp library.

My application repeatedly created a pulp model and solved it using Gurobi.

import objgraph
for i in range(10):
    objgraph.show_growth(limit=3)
    create_and_solve_model())
objgraph.show_growth()
import pdb;pdb.set_trace()

If all was working well the model would have gone out of scope and disappeared by the second call of objgraph.show_growth()

however I get the following

dict                3951      +301
list                2091      +170
LpVariable          1200      +120
Constr               960       +96
LpConstraint         960       +96
Var                  920       +92
tuple                968       +24
defaultdict          111       +11
> /home/stuart/example.py(52)<module>()
-> import pdb;pdb.set_trace()
(Pdb)

As you can see somthing has gone wrong and objects are staying in memory

If I then pick a class ('LpVariable) and trace back graphically the references to it like so

(Pdb) import random
(Pdb) import inspect
(Pdb) objgraph.show_chain(
          objgraph.find_backref_chain(
          random.choice(objgraph.by_type('LpVariable')),
          inspect.ismodule))

I get the following graph displayed in a window with xdot.

objgraph image showing a circular reference

From this graph it is quite clear that there is a circular reference involving the Gurobi model and my MasterLpProblem object. As the Gurobi model has a defined __del__ method the garbage collector does not delete it but rather stores it in the gc.garbage list.

After looking in my code I see that when I create a gurobi model I add a reference to the pulp.LpProblem that created it.

def buildSolverModel(self, lp):
    """
    Takes the pulp lp model and translates it into a gurobi
    model
    """
    log.debug("create the gurobi model")
    self.solverModel = lp.solverModel = gurobipy.Model(lp.name)            
    ...
    lp.solverModel._pulpModel = lp
    ...

This I thought was a good idea at the time but have never used it. So I deleted the 'lp.solverModel._pulpModel = lp' line, and my memory leak disappeared.

When I rerun the previous code I now get.

> /home/stuart/example.py(52)<module>()
-> import pdb;pdb.set_trace()
(Pdb)

Indicating that there is no growth in memory usage.

Hurray

Thursday
Jan132011

random.seed() and python module imports

Interesting factoid on the random library. 

I use random.seed() in my tests to get reproducible numbers (to test graphing and stats functions) and I have found the following unexpected behavior. 
>>> import random 

imports the random library as a singleton therefore, if you write: 
>>> random.seed(0) 

it will actually set a seed for all code (including library code) that uses the random library. Worse than that if your test code calls a library that uses random it will get a number from that sequence and now the random numbers in your tests are not in the same sequence as they may have been without the library call (this is the problem that caused me to think about this) 

Therefore my blanket recommendation for all code that uses random.seed() is that the import line be changed to the following: 
>>> from random import Random 
>>> random = Random() 

this will give you a new instance of random that will only be used within your module scope, and all previous calls to random.seed() etc will continue to work.

Thursday
Oct212010

nosetests function names

Be careful with function names in tests when using nosetests and python 

Nose tests automatically runs all functions in discovered files with the word test in them. This can bite you if for instance you put a function build_test_smelter used to define some test data for a subclass of TestCase in a _init_.py file. 

As nosetests will run this function as well (without the setUp and tearDown methods, that clean your database) you will end up with two instances of your smelter instead of one. This will not show up it you run the tests in a single file but will show up when you use 

$ bin/nosetests 

from the command line as then it will discover that function in your _init_.py file