Here are the topics we touched, with some weblinks to tutorials or discussions:

- Containers (lists, tuples, dictionaries)
- Tuples vs lists
- Comprehensions
- Variables vs names

Possible considerations:

- Use effort commensurate to need
- Start with googling
- Watch "external" signs of reliability:
- Decent website
- Documentation
- Funding if any
- Support resources
- Updates/releases
- Version number >= 1.0?
- Reasonable dependencies

- Download/install, try tutorial
- What are the alternatives?
- When everything fails, write your own

Python lists (and nested lists) are not usually suitable for mathematical work; they do not understand arithmetics natively, they are not efficient (because they require levels of indirection), and they require looping in Python, which is again not efficient.

By contrast, working with mathematical vectors and arrays requires fast operations on large memory blocks that contain homogeneous types (all ints, doubles, etc.). This is also the memory layout that's expected by C/Fortran programs, so interacting with external legacy or performance codes requires this same "buffer" paradigm.

This is what's provided by numpy, which is a building block and a basic infrastructure for most scientific applications in Python.

See Johansson's numpy tutorial; also on scipy, Python's "ecosystem" open-source scientific software, and on matplotlib.

In [15]:

```
import numpy as np
import matplotlib.pyplot as pp
%matplotlib inline
```

In [5]:

```
import scipy
```

`numpy.random.rand`

yields reals in [0,1], so we round them to 0 (dead) or 1 (alive).

In [6]:

```
nsize = 32
```

In [7]:

```
init = np.floor(np.random.rand(nsize,nsize) + 0.5)
```

Or we could convert a boolean array (which will be True/False with 50% probability) to integers...

In [8]:

```
(np.random.rand(nsize,nsize) > 0.5) * 1
```

Out[8]:

By the way, here's how you can count true elements in a boolean array:

In [9]:

```
np.sum(np.random.rand(nsize,nsize) > 0.5)
```

Out[9]:

Now we'd like to plot the array. `imshow`

from `matplotlib.pyplot`

seems to work well.

In [11]:

```
def plotgrid(x):
# turn off interpolation to get hard cell edges; put (0,0) at bottom left;
# use a gray colormap that will yield black/white
pp.imshow(x,interpolation='none',origin='bottom',cmap=pp.cm.gray)
```

In [12]:

```
plotgrid(init)
```

In [19]:

```
def lifestep_brute(x):
newx = x.copy()
m, n = x.shape
for i in range(m):
for j in range(n):
neighbors = (x[(i+1) % m,(j-1) % n] + x[(i+1) % m,j] + x[(i+1) % m,(j+1) % n] +
x[i, (j-1) % n] + x[i, (j+1) % n] +
x[(i-1) % m,(j-1) % n] + x[(i-1) % m,j] + x[(i-1) % m,(j+1) % n])
if x[i,j] == 1 and (neighbors < 2 or neighbors > 3):
newx[i,j] = 0
elif x[i,j] == 0 and neighbors == 3:
newx[i,j] = 1
return newx
```

This works OK, although it's not especially fast...

In [94]:

```
# let's make a function to run some steps with a choice of "stepper"
def runiter(stepper,iters=100,nsize=64):
x = np.floor(np.random.rand(nsize,nsize) + 0.5)
for i in range(iters):
x = stepper(x)
```

In [23]:

```
%timeit runiter(lifestep_brute,100)
```

We can plot a few frames using `matplotlib.pyplot.subplot`

, which works just like matlab's.

In [24]:

```
pp.figure(figsize=(12,12))
x = np.floor(np.random.rand(64,64) + 0.5)
for i in range(16):
pp.subplot(4,4,i+1)
plotgrid(x)
x = lifestep(x)
```

In `numpy`

, it's always more efficient to work with full arrays; however in this case this means that we need to do something special on the boundaries to get the wraparound effect. That's a lot of special cases to write in our code (the four edges, plus the four corners, all require different indexing).

We could make an extended copy of the array by pasting the last column to the left of the first, the first column to the right of last, etc. Or, we look in the `numpy`

manual and find the function `roll`

that yields a rolled-with-wraparound copy of a matrix. (`numpy.lookfor`

, and in this case `numpy.lookfor('shift')`

is a good way to find these things!)

In [36]:

```
def lifestep_roll(x):
newx = x.copy()
neighbors = 0
for di in [(1,1),(1,0),(1,-1),(0,1),(0,-1),(-1,1),(-1,0),(-1,-1)]:
neighbors += np.roll(np.roll(x,di[0],axis=0),di[1],axis=1)
# notice the numpy *fancy indexing* to work on selected items of an array
# based on some conditions
newx[(x == 1) & (neighbors < 2)] = 0
newx[(x == 1) & (neighbors > 3)] = 0
newx[(x == 0) & (neighbors == 3)] = 1
return newx
```

It's 60 times faster!

In [37]:

```
%timeit runiter(lifestep_roll,100)
```

In [41]:

```
import scipy.signal
kernel = np.array([[1,1,1],[1,0,1],[1,1,1]])
def lifestep_conv(x):
newx = x.copy()
neighbors = scipy.signal.convolve2d(x,kernel,boundary='wrap',mode='same')
newx[(x == 1) & (neighbors < 2)] = 0
newx[(x == 1) & (neighbors > 3)] = 0
newx[(x == 0) & (neighbors == 3)] = 1
return newx
```

For this size of matrix, the timing is actually comparable to the rolling version.

In [43]:

```
%timeit runiter(lifestep_conv,100)
```

*very* mathematically inclined may realize that the convolution can be performed efficiently with FFTs. It takes some trial and error to figure out some necessary adjustments.

In [88]:

```
import numpy.fft as npf
def fft_convolve2d(x,y):
"""2D convolution using FFT, adapted from Tristan Hearn's example."""
fr = npf.fft2(x)
fr2 = npf.fft2(y,s=fr.shape)
convolve = np.real(npf.ifft2(fr * fr2))
# the result needs to be shifted around
m, n = y.shape
convolve = np.roll(convolve,-int((m-1)/2),axis=0)
convolve = np.roll(convolve,-int((n-1)/2),axis=1)
return convolve.round()
```

In [89]:

```
def lifestep_fftconv(x):
newx = x.copy()
neighbors = fft_convolve2d(x,kernel)
newx[(x == 1) & (neighbors < 2)] = 0
newx[(x == 1) & (neighbors > 3)] = 0
newx[(x == 0) & (neighbors == 3)] = 1
return newx
```

This is slower!

In [93]:

```
%timeit runiter(lifestep_fftconv,100)
```

`matplotlib`

offers the tools to do so, but the interface is not the best. It requires that we make a figure, then provide a function to modify it for each frame. See for instance this blog post.

In [107]:

```
frames = []
x = np.floor(np.random.rand(256,256) + 0.5)
# we collect the frames in a list
for i in range(250):
frames.append(x)
x = lifestep_conv(x)
```

This will take a few seconds...

In [108]:

```
import matplotlib.animation as ma
# default resolution is 72 dpi
figure = pp.figure(figsize=(5,5))
image = pp.imshow(frames[i],interpolation='none',cmap=pp.cm.gray)
# the animate function needs to change the elements of the figure,
# and return an iterable object containing the parts of the figure that have changed
def animate(i):
image.set_array(frames[i])
return [image]
# the API is not the best though...
anim = ma.FuncAnimation(figure,animate,frames=len(frames),
interval=20,blit=True)
anim.save('life.mp4',fps=30);
```

In [111]:

```
!open life.mp4
```

*SmoothLife* floating-point generalization of the Game of Life. See his paper.

`conda`

or `pip`

.

It's HDF5, a "data model, library, and file format", "portable and extensible", that "supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data". h5py is *the* Python library to read/write HDF5; it can be used to get efficient numpy views on very large datafiles.

- http://www.hdfgroup.org
- http://www.h5py.org

LIGO has recently released its flagship "S5" data: https://losc.ligo.org

In [5]:

```
!wget https://losc.ligo.org/archive/data/S5/843055104/H-H1_LOSC_4_V1-843894784-4096.hdf5
```

In [7]:

```
!pip install h5py
```

In [1]:

```
import h5py
```

In [2]:

```
data = h5py.File('H-H1_LOSC_4_V1-843894784-4096.hdf5','r')
```

In [4]:

```
data
```

Out[4]:

HDF5 file objects have a tree-like structure, and in the Python API tree nodes at every level behave as Python dictionaries. Remember the Python `dict`

methods, which return iterators (technically, *dictionary views*) that you can loop over:

`keys()`

, yielding the keys indexing the dictionary,`values()`

, yielding the values in the same order,`items()`

, yielding (key,value) pairs.

If you need a proper list of any of these, you can always feed the iterators to the `list()`

constructor.

In [5]:

```
list(data.items())
```

Out[5]:

h5py implements the somewhat peculiar `visit`

method that walks down the HDF tree in full.

In [6]:

```
def printname(name):
print("%-35s %s" % (name,data[name]))
```

In [7]:

```
data.visit(printname)
```

There's useful information in "`meta`

":

In [8]:

```
data['meta/GPSstart'].value, data['meta/Duration'].value
```

Out[8]:

"`strain`

" seems to be the main structure containing data...

In [9]:

```
strain = data['strain/Strain']
```

...which looks like a numpy array...

In [10]:

```
type(strain.value), strain.value
```

Out[10]:

But it has also attributes:

In [12]:

```
list(strain.attrs.items())
```

Out[12]:

We use these to make an array of timestamps.

In [16]:

```
tstart = strain.attrs['Xstart']
dt = strain.attrs['Xspacing']
tend = tstart + strain.attrs['Npoints'] * dt
# according to C convention, "tend" is not actually included in this array
ts = np.arange(tstart,tend,dt)
```

In [17]:

```
len(ts), len(strain.value)
```

Out[17]:

Let's plot the very beginning of the data.

In [18]:

```
pp.plot(ts[:10000],
strain.value[:10000])
```

Out[18]:

`quality`

", which (we learn from https://losc.ligo.org/archive/dataset/S5) describes the quality of the data with a 1-Hz bitmask array.

In [19]:

```
quality = data['quality/simple']
```

In [23]:

```
list(quality.items())
```

Out[23]:

These are the meanings of the bits:

In [28]:

```
for bit,(name,desc) in enumerate(zip(quality['DQShortnames'].value,quality['DQDescriptions'].value)):
print('%2d %-16s %s' % (bit, name.decode(), desc.decode()))
```

`decode`

, which is needed in Python 3 to avoid displaying strings as `b'...'`

. The issue is that in Python 3 native strings are unicode, but the LOSC files contain simple ASCII strings (where each character is 8 bits), which in Python 3 correspond to `bytes`

object, not quite a string. The `bytes`

method `decode()`

, with the implied argument `'UTF-8'`

, uses a unicode encoding to make a unicode string out of a sequence of bytes. Its inverse is `encode()`

.

Back to LIGO. How much data at "CBCHIGH" categories 1 through 4? CBCHIGH denotes the search for high-mass binary inspirals. As for the categories,

- CAT1 = data that passes basic quality checks
- CAT2 = data that does not have obvious corruption or noisy features with clear correlations with instrument events and failure modes
- CAT3 = data that does not have even less obvious noisy features with unclear instrumental associations
- CAT4 = data that has been cleaned of glitchy features that result in excess triggers

A LIGO search would use CAT2 data; an upper-limit analysis would use CAT3.

In [31]:

```
dqarray = quality['DQmask'].value
for cbchigh_cat in [1,2,3,4]:
bitmask = 1 << cbchigh_cat
print("%s seconds at CAT %d" % (sum(dqarray & bitmask > 0),cbchigh_cat))
```

`matplotlib.pyplot.matshow`

the latter. We want a discrete rather than continuous colormap (i.e., 1 = red, 2 = yellow, 3 = green, 4 = blue). Some quick googling shows that this can be achieved with `ListedColormap`

and `BoundaryNorm`

.

In [32]:

```
dqplot = np.zeros(dqarray.shape)
for cbchigh_cat in [1,2,3,4]:
bitmask = 1 << cbchigh_cat
dqplot[dqarray & bitmask > 0] = cbchigh_cat
```

In [33]:

```
import matplotlib.colors as mpc
cmap = mpc.ListedColormap(['red','yellow','green','blue'])
norm = mpc.BoundaryNorm([0,1,2,3],cmap.N)
pp.matshow(dqplot.reshape((16,256)),cmap=cmap,norm=norm,aspect=1.5);
```

*masked arrays* to obtain the segments of CAT2 data. * http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

In [35]:

```
dq2 = (dqarray & (1 << 2)) > 0
masked = np.ma.masked_array(np.zeros(4096),~dq2)
segs = np.ma.flatnotmasked_contiguous(masked)
print(segs)
```

This will get us the strain data for segment `n`

.

In [36]:

```
def segdata(n):
start, end = segs[n].start * 4096, segs[n].stop * 4096
return ts[start:end], strain.value[start:end]
```

Let's look at the PSD for the first few segment, using `psd`

from `matplotlib.mlab`

.

In [37]:

```
import matplotlib.mlab as mp
```

In [38]:

```
def plotpsd(i):
segt, segd = segdata(i)
psd, freqs = mp.psd(segd,NFFT=2048,Fs=1.0/dt)
pp.loglog(freqs[1:],np.sqrt(psd[1:]))
pp.xlabel('$f/\mathrm{Hz}$'); pp.ylabel('$S(f)/\mathrm{Hz}^{1/2}$')
pp.axis([10,1000,1e-24,1e-14]);
pp.title('t = [%s,%s]' % (segt[0],segt[-1]))
```

In [40]:

```
print(len(segs))
```

In [41]:

```
pp.figure(figsize=(12,16))
for s in range(8):
pp.subplot(4,2,s+1)
plotpsd(s)
pp.tight_layout()
```

`matplotlib.mlab.specgram`

. We examing segments of 4096 points = 1 sec.

In [42]:

```
def plotspec(i):
segt, segd = segdata(i)
pxx, freqs, t = mp.specgram(segd,NFFT=4096,Fs=1.0/dt)
# normalize each frequency row by its median (effectively to whiten data)
pxx = np.array([row / np.median(row) for row in pxx])
pp.pcolormesh(t,freqs,np.log10(pxx),cmap=pp.get_cmap('Oranges'))
pp.xlabel('$t/\mathrm{s}$'); pp.ylabel('$S(f)/\mathrm{Hz}^{1/2}$')
pp.axis([0,t[-1],freqs[0],freqs[-1]])
pp.title('t = [%s,%s]' % (segt[0],segt[-1]))
```

In [43]:

```
pp.figure(figsize=(12,16))
for s in range(16,24):
pp.subplot(4,2,s-16+1)
plotspec(s)
pp.tight_layout()
```