Tuesday
Oct182011

Notes on building C extensions with python

Previously, when I have needed to access C libraries with python I have used the [ctypes](http://docs.python.org/library/ctypes.html) library

However while I have be working on the [dippy module](https://projects.coin-or.org/CoinBazaar/wiki/Projects/Dippy) I have needed to link it into some fairly complicated C code (being the [DIP library](https://projects.coin-or.org/Dip)).

Dippy is a python extension module that directly uses the python c api. As Qi-Shan Lim and [Michael O'Sullivan](http://www.des.auckland.ac.nz/uoa/michael-osullivan) wrote dippy I will not go into details of its implementation but instead discuss a toy example using [Cython](http://docs.cython.org).

So taken from the basic [Cython tutorial](http://docs.cython.org/src/userguide/tutorial.html) (altered to use setuptools) we start with a hello world example.

Preliminaries
=============

1. Install cython (on ubuntu $sudo apt-get install cython)
2. Setup a virtual environment to play with

$ mkdir cython-example

$ cd cython-example

$ virtualenv . #note you can't use --no-site-packages as you need cython

$ source bin/activate

(cpython-example)$


The Hello World example
-----------------------

Create these files

helloworld.pyx

print "Hello World"

and setup.py (this is far to complicated but I wanted to use setuptools instead of disutils)

#!/usr/bin/env python

from setuptools import setup
from distutils.extension import Extension

# setuptools DWIM monkey-patch madness
# http://mail.python.org/pipermail/distutils-sig/2007-September/thread.html#8204
import sys
if 'setuptools.extension' in sys.modules:
m = sys.modules['setuptools.extension']
m.Extension.__dict__ = m._Extension.__dict__

setup(
setup_requires=['setuptools_cython'],
ext_modules = [Extension("helloworld", ["helloworld.pyx"],
language="c++")]
)


Then do the following

(cython-example)$ python setup.py build_ext -i
(cython-example)$ python
>>> import helloworld
Hello World

You will also see a helloworld.c generated by Cython.

Adding an External Dependency
-----------------------------

This is the real brain twister I had to figure out. Lets import a constant from a large C++ project.

helloworld.pyx

cdef extern from "Decomp.h":
double DecompBigNum

print "Hello World %s" % DecompBigNum

Now we build it:

(cython-example)$ python setup.py build_ext -I DIP-trunk/include/coin -i
(cython-example)$ python
>>> import helloworld
Hello World 1e+21

Linking shared libraries madness
--------------------------------

If you start using functions you need to include the libraries for the big project

(cython-example)$ python setup.py build_ext \
-I DIP-trunk/include/coin -L DIP-trunk/lib -i

Now the nasty problem that does occur is when your big dependency is actually built with shared libraries. Then your new .so file will be dependent on a lot of libraries that may not be on the user's system.

(cython-example)$ ldd _dippy.so
linux-vdso.so.1 => (0x00007fff1e08f000)
libDecomp.so.0 => not found
libAlps.so.0 => not found
libCbcSolver.so.0 => /usr/lib/libCbcSolver.so.0 (0x00007ffccade2000)
libCgl.so.0 => /usr/lib/libCgl.so.0 (0x00007ffccaafd000)
libCbc.so.0 => /usr/lib/libCbc.so.0 (0x00007ffcca804000)
libOsiClp.so.0 => /usr/lib/libOsiClp.so.0 (0x00007ffcca5be000)
libOsi.so.0 => /usr/lib/libOsi.so.0 (0x00007ffcca366000)
libOsiCbc.so.0 => not found
libClp.so.0 => /usr/lib/libClp.so.0 (0x00007ffcc9fd4000)
libCoinUtils.so.0 => /usr/lib/libCoinUtils.so.0 (0x00007ffcc9c8a000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffcc9984000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffcc96fe000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffcc94e8000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffcc92ca000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffcc8f35000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ffcc8d2d000)
libVol.so.0 => /usr/lib/libVol.so.0 (0x00007ffcc8b26000)
liblapack.so.3gf => /usr/lib/liblapack.so.3gf (0x00007ffcc7f30000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffcc7d18000)
libbz2.so.1.0 => /lib/libbz2.so.1.0 (0x00007ffcc7b08000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffccb302000)
libblas.so.3gf => /usr/lib/libblas.so.3gf (0x00007ffcc7292000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007ffcc6fae000)

In fact, this library will not even work on _your_ system as the dependencies are listed as 'not found'. To get it to work on your system you would use the RPATH directive.

(cython-example)$ python setup.py build_ext \
-I DIP-trunk/include/coin -L DIP-trunk/lib \
-R DIP-trunk/lib -i

(cython-example)$ ldd _dippy.so
linux-vdso.so.1 => (0x00007fff399ff000)
libDecomp.so.0 => DIP-trunk/lib/libDecomp.so.0 (0x00007fc1c42a3000)
libAlps.so.0 => DIP-trunk/lib/libAlps.so.0 (0x00007fc1c4071000)
libCbcSolver.so.0 => DIP-trunk/lib/libCbcSolver.so.0 (0x00007fc1c3da3000)
libCgl.so.0 => DIP-trunk/lib/libCgl.so.0 (0x00007fc1c3aac000)
libCbc.so.0 => DIP-trunk/lib/libCbc.so.0 (0x00007fc1c37b9000)
libOsiClp.so.0 => DIP-trunk/lib/libOsiClp.so.0 (0x00007fc1c3574000)
libOsi.so.0 => DIP-trunk/lib/libOsi.so.0 (0x00007fc1c3324000)
libOsiCbc.so.0 => DIP-trunk/lib/libOsiCbc.so.0 (0x00007fc1c3114000)
libClp.so.0 => DIP-trunk/lib/libClp.so.0 (0x00007fc1c2d94000)
libCoinUtils.so.0 => DIP-trunk/lib/libCoinUtils.so.0 (0x00007fc1c2a68000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc1c2741000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1c24bc000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc1c22a6000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1c2087000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1c1cf3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc1c4785000)

See those nasty hard coded paths, this library would obviously never work on anyone else's system.

Static Linking
--------------

The way (I've found) to fix this with a project with auto config is to build it with the following.

$ ./configure --disable-shared --with-pic #pic needed for 64bit
$ make

Then all the dependencies are statically linked and hen finally we get it all in one library

(cython-example)$ python setup.py build_ext \
-I DIP-trunk/include/coin -L DIP-trunk/lib -i
(cython-example)$ ldd _dippy.so
linux-vdso.so.1 => (0x00007fffd75ff000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f377f7d2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f377f54d000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f377f336000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f377f118000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f377ed84000)
/lib64/ld-linux-x86-64.so.2 (0x00007f37802d7000)

Friday
Aug052011

Finding python memory leaks with objgraph

I have had a rogue memory leak in one of my programs for a while, but I have now been able to track it down.

It hasn't been an issue until recently when I have been trying to solve a large number of problems. I did some googling and came up with objgraph a module that lets you graphically explore the objects that remain in python's memory.

As python is a garbage collected language memory leaks tend to be caused by one of these reasons

  • Accidentally adding a reference to objects to something in the global scope so they are never garbage collected
  • Circular references that contain an object with a custom __del__() method
  • Memory leakage in a C extension module
  • some other reasons that I have not encounted :-)

To diagnose the first two object graph provides a nice interface.

To install objgraph for interactive use in Ubuntu

$ sudo apt-get python-pygraphviz
$ sudo pip install xdot
$ sudo pip install objgraph

Here is my example using the pulp library.

My application repeatedly created a pulp model and solved it using Gurobi.

import objgraph
for i in range(10):
    objgraph.show_growth(limit=3)
    create_and_solve_model())
objgraph.show_growth()
import pdb;pdb.set_trace()

If all was working well the model would have gone out of scope and disappeared by the second call of objgraph.show_growth()

however I get the following

dict                3951      +301
list                2091      +170
LpVariable          1200      +120
Constr               960       +96
LpConstraint         960       +96
Var                  920       +92
tuple                968       +24
defaultdict          111       +11
> /home/stuart/example.py(52)<module>()
-> import pdb;pdb.set_trace()
(Pdb)

As you can see somthing has gone wrong and objects are staying in memory

If I then pick a class ('LpVariable) and trace back graphically the references to it like so

(Pdb) import random
(Pdb) import inspect
(Pdb) objgraph.show_chain(
          objgraph.find_backref_chain(
          random.choice(objgraph.by_type('LpVariable')),
          inspect.ismodule))

I get the following graph displayed in a window with xdot.

objgraph image showing a circular reference

From this graph it is quite clear that there is a circular reference involving the Gurobi model and my MasterLpProblem object. As the Gurobi model has a defined __del__ method the garbage collector does not delete it but rather stores it in the gc.garbage list.

After looking in my code I see that when I create a gurobi model I add a reference to the pulp.LpProblem that created it.

def buildSolverModel(self, lp):
    """
    Takes the pulp lp model and translates it into a gurobi
    model
    """
    log.debug("create the gurobi model")
    self.solverModel = lp.solverModel = gurobipy.Model(lp.name)            
    ...
    lp.solverModel._pulpModel = lp
    ...

This I thought was a good idea at the time but have never used it. So I deleted the 'lp.solverModel._pulpModel = lp' line, and my memory leak disappeared.

When I rerun the previous code I now get.

> /home/stuart/example.py(52)<module>()
-> import pdb;pdb.set_trace()
(Pdb)

Indicating that there is no growth in memory usage.

Hurray

Wednesday
Apr202011

skeleton a template for python projects

I presented a talk on sphinx  at the Auckland python users group. Instead of trying to present all the various interactions between sphinx and setup.py. I used skeleton, a tool that can be used to create python projects.

Skeleton uses templates to create a ready made project, similar to pastescript.

Imade a fork of this project on github and added a template where sphinx is integrated into the project. This fork is currently availible as skeleton_stu on pypi until the changes are merged to the original project. 

To create a basic sphinx package

$pip install skeleton_stu

$skeleton_package_sphinx [your_directory]

answer a few questions and then you are done :-)

Saturday
Apr162011

Squarespace website

Well I'm converting my website to use squarespace.

Seems okay but I would like wysiwyg table construction.

Thursday
Jan132011

random.seed() and python module imports

Interesting factoid on the random library. 

I use random.seed() in my tests to get reproducible numbers (to test graphing and stats functions) and I have found the following unexpected behavior. 
>>> import random 

imports the random library as a singleton therefore, if you write: 
>>> random.seed(0) 

it will actually set a seed for all code (including library code) that uses the random library. Worse than that if your test code calls a library that uses random it will get a number from that sequence and now the random numbers in your tests are not in the same sequence as they may have been without the library call (this is the problem that caused me to think about this) 

Therefore my blanket recommendation for all code that uses random.seed() is that the import line be changed to the following: 
>>> from random import Random 
>>> random = Random() 

this will give you a new instance of random that will only be used within your module scope, and all previous calls to random.seed() etc will continue to work.