Dear Salonnières,
thanks for coming to our compu-salon, episode 3. Here's a brief summary of what we covered.
The next salon will be this Friday, Feb 10, at 4pm (which, given the results of the Doodle poll, will be our default time going forward). We'll talk about object-oriented programming (OOP), and especially the very friendly but slightly quirky Python notion of OOP. As an example, we'll build Markov Chain Monte Carlo integrators for simple problems in statistical physics (no Bayesian inference for you—not yet).
I'll send out a reminder on Friday morning. And remember, Errors Should Never Pass Silently!
Michele
Also at www.vallis.org/salon.
In our third meeting, we talked about:
Numpy is the de facto standard Python matrix library. You can install it in your virtualenv using pip install numpy
(OS X comes preinstalled with an old version that you probably don't want). numpy provides N-dimensional arrays of homogeneous numerical types, such as float64
(the same as Python's float
), complex128
, int32
, and so on, and it defines general mathematical operations among them:
>>> import numpy as N
>>> r = N.random.random((3,3))
>>> r
array([[ 0.35951377, 0.6479556 , 0.10747995],
[ 0.42836043, 0.42579924, 0.02557081],
[ 0.08088397, 0.69398481, 0.73781365]])
>>> r + r
array([[ 0.71902754, 1.29591119, 0.2149599 ],
[ 0.85672087, 0.85159848, 0.05114161],
[ 0.16176793, 1.38796961, 1.47562729]])
>>> N.sin(r)
array([[ 0.35181913, 0.60355763, 0.10727314],
[ 0.41537993, 0.41304879, 0.02556802],
[ 0.0807958 , 0.63960539, 0.67267175]])
You do need numpy's special version of sin
:
>>> import math
>>> math.sin(r)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only length-1 arrays can be converted to Python scalars
The numpy module also gives you smart slicing and indexing (especially Boolean indexing) of arrays:
>>> r[0,:]
array([ 0.35951377, 0.6479556 , 0.10747995])
>>> r[:,1]
array([ 0.6479556 , 0.42579924, 0.69398481])
>>> r[0:2,0:2]
array([[ 0.35951377, 0.6479556 ],
[ 0.42836043, 0.42579924]])
>>> r > 0.5
array([[False, True, False],
[False, False, False],
[False, True, True]], dtype=bool)
>>> r[r > 0.5] = 1
>>> r
array([[ 0.35951377, 1. , 0.10747995],
[ 0.42836043, 0.42579924, 0.02557081],
[ 0.08088397, 1. , 1. ]])
...as well as some linear algebra operations, and the FFT:
>>> N.linalg.inv(r)
array([[-1.66715988, 3.71781084, 0.08411884],
[ 1.77572889, -1.46134953, -0.15348737],
[-1.64088238, 1.16063824, 1.1466835 ]])
>>> N.linalg.eig(r)
(array([-0.24606504, 0.77903735, 1.25234071]),
array([[ 0.78050423, -0.21696928, 0.33931746],
[-0.51131524, -0.19385374, 0.20426076],
[ 0.35968023, 0.95673667, 0.91822721]]))
>>> N.fft.fft(r[:,0])
array([ 0.86875817+0.j, 0.10489157-0.30092345j, 0.10489157+0.30092345j])
By contrast, Python's lists are slower, and may consist of arbitrary, heterogeneous types. See the tentative numpy tutorial for a decent, if not final, introduction.
Mathematical operations on numpy arrays are often sufficiently fast for practical use, especially if you can work with whole-matrix, or sliced-matrix, expressions. Boolean indexing can help to avoid explicit loops and if-then constructs, as we see below in our Mandelbrot-set example mandelplot()
. But when we need more speed, we can exploit the fact that the memory layout of numpy arrays is compatible with C or Fortran, write efficient C/Fortran code that does the dirty work, and call it from Python.
For the beautiful mathematics of the Mandelbrot set, see wikipedia.
Indeed, this was the main topic of our meeting. Let's start with the big picture:
In this order: libraries (i.e., using the well-documented, -maintained, and -debugged code of others, which happens to be C/Fortran, in your beautiful, highly readable Python script); legacy (i.e., using the less well-documented, -etc., code that you or your advisor unfortunately wrote in C/Fortran a few years back); performance (i.e., when you really need the speed that only tight hand-coded loops can give you, although you keep reminding yourself that time-to-article = coding + debugging + running).
The reason not to do C/Fortran alone while tackling one of these use cases is that Python allows you an extremely expedient working style where you can call library/legacy/performant code interactively from the Python shell, or very naturally from readable Python scripts, rather than from verbose, compiled main() functions.
The general principle is that you need to compile your C/Fortran code, together with more C/Fortran code that provides glue between the languages, into a Python extension module. The latter is a shared library that you can import in Python, gaining access to the C/Fortran functions and data. This glue/wrapper code is exceedingly dense and boring to write: it looks like the internals of the Python interpreter, not a place you'd want to tour; but luckily there are several utilities that can help us do so.
So we've talked about the three utilities that IMHO provide the easiest experience:
Cython is really a Python compiler that can make your code run faster under some conditions, if you declare the C types of your variables in advance. That goes against the Python spirit, but it's also exactly the information that we need needed to interface with C. Thus, Cython is also a good wrapper generator. For instance, we can wrap the C function cmandel(int *esc,int ny,int nx,int maxi)
, defined in cmandel.c
, with the very short Cython function
def pyxmandel(N.ndarray[int,ndim=2] esc,maxi):
cmandel(<int*>esc.data,esc.shape[0],esc.shape[1],maxi)
provided that we've shown Cython the prototype for cmandel
:
cdef extern:
void cmandel(int *col,int ny,int nx,int maxi)
We can then gain access to the wrapped function with
>>> import pyxmandel
>>> pyxmandel.pymandel
<function pyxmandel.pyxmandel>
To compile the Cython code (pyxmandel.pyx
, see below) into the C wrapper, we use the command-line program cython
(install it with pip install cython
); to compile and link the wrapper and the C code into a Python extension, we use gcc, but we need to give it some special flags and include locations (for Python and for numpy). It's easiest to whip up a quick Makefile for that, so see below.
BTW, if you're wondering why the wrapper is pyxmandel.pyx
, it's because Cython used to be called pyrex, and the .pyx
suffix stuck with the project.
SWIG is a full-blown interface generator from C/C++ to many scripting languages, including Python. The way the wrapping works is that you write an interface file (swigmandel.i
for us), which includes some boilerplate and, crucially, the desired name of the extension module and the name of one or more C/C++ headers:
%module swigmandel
%include "cmandel.h"
All the functions defined within the headers will then be made available to Python. Think how quickly you could wrap entire libraries! However, again the numpy array requires a little magic to make it understandable to C, so before the %include
we add a numpy typemap.
%apply (int* INPLACE_ARRAY2, int DIM1, int DIM2) {(int *esc,int ny,int nx)};
which maps a single numpy-array argument into a pointer and two integers giving the array size, using the very argument names declared in cmandel.h
. We then gain access to the wrapped function with
>>> import swigmandel
>>> swigmandel.cmandel
<function _swigmandel.cmandel>
To create the SWIG C/Python wrappers we use the command-line program swig
(included in OS X, but installed with brew install swig
), which results in swigmandel_wrap.c
and swigmandel.py
. Again we use gcc to compile and link the extension module (see the Makefile).
Last, f2py is an interface generator for F77/F95. It's very transparent, and deals with numpy objects natively. So a Fortran function declaration
subroutine fmand(esc,ny,nx,maxi)
integer, intent(in) :: ny,nx,maxi
integer*4, dimension(0:ny-1,0:nx-1), intent(inout) :: esc
would automatically take a single numpy array as argument, and unpack it into esc
, ny
, and nx
. That's very cool, and it's possible because Fortran allows such precise, unambiguous subroutine declarations. However, there is one hitch: the memory ordering of arrays is different in C (row-first) and Fortran (column-first). Luckily we can tell numpy to allocate a Fortran-style array by giving the array constructor numpy.array
the option order='F'
.
We can create the Python extension with a single line of code, f2py -c -m fmandel fmandel.f90
(also in the Makefile).
I should also say that writing a setup.py
python script is a good alternative to a standard Makefile. But I feel that you've had enough details for today, so we'll talk about that in the future.
I've promised to give you the (OS X) installation recipe for the very cool Python qtconsole, featuring, among other things, inline plotting. Here's the recipe, assuming that you have installed homebrew, and are working in your virtualenv:
>> brew install qt
>> brew install pyqt # then add /usr/local to $PYTHONPATH
>> brew install zmq
>> easy_install pyzmq
>> pip install pygments
Then you can run qtconsole with
ipython qtconsole --pylab=inline
Curt also asked about GUIs for Python programming. There are many listed on the Python wiki. I hear a lot about (but I have never used) PyDev.
Use numpy and matplotlib to plot the bifurcation diagram for the logistic map. Or use SWIG to wrap the entire Numerical Recipes, and share your Makefile with us! (On Tapir's newman, the NR source code is in /usr/local/src
.)
It's a tar of several files:
lecture3.py
-- the main Python script, including the Mandelbrot loop in Pythoncmandel.c
-- the Mandelbrot loop in Ccmandel.h
-- the header file for cmandel.c
(included in swigmandel.i
)pymandel.pyx
-- the Cython wrapper fileswigmandel.i
-- the SWIG interface filenumpy.i
-- a SWIG include that should be distributed with numpy, but isn'tfmandel.f90
-- the Mandelbrot loop in FortranMakefile
-- to Make all three extension modulesI've refactored all code, compared to what I showed in class, to emphasize the section of Python that gets replaced by the compiled modules.