**Michele Vallisneri**, Jet Propulsion Laboratory, California Institute of Technology, Research Scientist, vallis@vallis.org**Sarah Burke-Spolaor**, Jet Propulsion Laboratory, California Institute of Technology, Postdoctoral Associate, Sarah.Burke-Spolaor@jpl.nasa.gov**Joseph Lazio**, Jet Propulsion Laboratory, California Institute of Technology, Principal Research Scientist, Joseph.Lazio@jpl.nasa.gov**Curt Cutler**, Jet Propulsion Laboratory, California Institute of Technology, Senior Research Scientist, curt.j.cutler@jpl.nasa.gov

Our model included a power-law GW-background of unknown amplitude and exponent, as well as individual red-noise parameters for each pulsar (although the GW background and the red noise are both characterized by power laws, the corresponding amplitudes and exponents are defined with two different conventions, consistently with those used in the documentation of the open challenge datasets). We assumed that the tim-file TOA errors were accurate, and were the only source of white timing noise (i.e., we assumed no EFAC or EQUAD). No GW signal of any other kind was included in our model, so none was detected.

This submission can also be seen, with figures, at http://www.vallis.org/ipta/submission.html.

Population statistics for GW parameters and individual red-noise parameters for all pulsars (at the top we show the mean A_red/alpha_red and their variances):

```
[1000 emcee walkers, statistics from step 400/400]
A_gw : 1.59e-14 ± 2.34e-15 alpha_gw : -0.419 ± 0.103
A_red[mean] : 2.84e-21 ± 6e-22 alpha_red[mean]: 1.57 ± 0.0781
A_red[1] : 2.05e-21 ± 1.06e-21 alpha_red[1] : 1.54 ± 0.168
A_red[2] : 2.8e-21 ± 1.08e-21 alpha_red[2] : 1.61 ± 0.145
A_red[3] : 3.4e-21 ± 1.48e-21 alpha_red[3] : 1.63 ± 0.131
A_red[4] : 2.69e-21 ± 1.01e-21 alpha_red[4] : 1.54 ± 0.178
A_red[5] : 2.21e-21 ± 9.45e-22 alpha_red[5] : 1.64 ± 0.141
A_red[6] : 2.76e-21 ± 1.13e-21 alpha_red[6] : 1.62 ± 0.129
A_red[7] : 2.18e-21 ± 1.07e-21 alpha_red[7] : 1.6 ± 0.159
A_red[8] : 2.99e-21 ± 1.02e-21 alpha_red[8] : 1.64 ± 0.114
A_red[9] : 3.47e-21 ± 1.08e-21 alpha_red[9] : 1.45 ± 0.191
A_red[10] : 2.87e-21 ± 1.23e-21 alpha_red[10] : 1.49 ± 0.196
A_red[11] : 2.28e-21 ± 1.03e-21 alpha_red[11] : 1.67 ± 0.126
A_red[12] : 2.65e-21 ± 1.13e-21 alpha_red[12] : 1.58 ± 0.177
A_red[13] : 3e-21 ± 1.23e-21 alpha_red[13] : 1.64 ± 0.138
A_red[14] : 3.35e-21 ± 9.72e-22 alpha_red[14] : 1.61 ± 0.147
A_red[15] : 3.6e-21 ± 1.06e-21 alpha_red[15] : 1.48 ± 0.174
A_red[16] : 3.53e-21 ± 1.15e-21 alpha_red[16] : 1.45 ± 0.178
A_red[17] : 2.17e-21 ± 1.06e-21 alpha_red[17] : 1.57 ± 0.172
A_red[18] : 3.35e-21 ± 9.51e-22 alpha_red[18] : 1.55 ± 0.16
A_red[19] : 2.03e-21 ± 9.96e-22 alpha_red[19] : 1.76 ± 0.108
A_red[20] : 2.74e-21 ± 1.1e-21 alpha_red[20] : 1.57 ± 0.148
A_red[21] : 1.82e-21 ± 1.1e-21 alpha_red[21] : 1.67 ± 0.134
A_red[22] : 2.4e-21 ± 1.04e-21 alpha_red[22] : 1.65 ± 0.149
A_red[23] : 4.51e-21 ± 1.25e-21 alpha_red[23] : 1.63 ± 0.12
A_red[24] : 3.04e-21 ± 1.05e-21 alpha_red[24] : 1.57 ± 0.156
A_red[25] : 3.36e-21 ± 1.05e-21 alpha_red[25] : 1.57 ± 0.161
A_red[26] : 1.99e-21 ± 8.96e-22 alpha_red[26] : 1.38 ± 0.183
A_red[27] : 2.84e-21 ± 9.39e-22 alpha_red[27] : 1.6 ± 0.139
A_red[28] : 2.59e-21 ± 1.05e-21 alpha_red[28] : 1.5 ± 0.176
A_red[29] : 1.89e-21 ± 1.03e-21 alpha_red[29] : 1.4 ± 0.185
A_red[30] : 2.71e-21 ± 1.27e-21 alpha_red[30] : 1.56 ± 0.147
A_red[31] : 3.04e-21 ± 1.06e-21 alpha_red[31] : 1.6 ± 0.147
A_red[32] : 3.4e-21 ± 1.14e-21 alpha_red[32] : 1.59 ± 0.138
A_red[33] : 3.89e-21 ± 1.17e-21 alpha_red[33] : 1.64 ± 0.133
A_red[34] : 2.65e-21 ± 1.08e-21 alpha_red[34] : 1.61 ± 0.145
A_red[35] : 3.17e-21 ± 1.37e-21 alpha_red[35] : 1.55 ± 0.174
A_red[36] : 2.68e-21 ± 1.01e-21 alpha_red[36] : 1.51 ± 0.159
```

Figures: histograms, chains, estimator convergence, red-noise parameters, red-noise parameter convergence

For dataset 1 alone, we ran also without any red noise, for comparison: indeed, the resulting GW parameters are incompatible:

```
A_gw : 1.13e-14 ± 9.72e-16 alpha_gw : -0.651 ± 0.077
[1000 emcee walkers, statistics from step 100/100]
```

Figures: histograms, chains, estimator convergence

Population statistics for GW parameters and individual red-noise parameters for all pulsars:

```
[1000 emcee walkers, statistics from step 100/100]
A_gw : 5.3e-14 ± 5.47e-15 alpha_gw : -0.597 ± 0.114
A_red[mean] : 2.89e-21 ± 1.16e-22 alpha_red[mean]: 1.74 ± 0.0813
A_red[1] : 2.81e-21 ± 7.69e-22 alpha_red[1] : 1.73 ± 0.164
A_red[2] : 2.67e-21 ± 7.82e-22 alpha_red[2] : 1.72 ± 0.194
A_red[3] : 3.15e-21 ± 7.49e-22 alpha_red[3] : 2.18 ± 0.17
A_red[4] : 3.06e-21 ± 7.36e-22 alpha_red[4] : 1.73 ± 0.17
A_red[5] : 2.89e-21 ± 7.92e-22 alpha_red[5] : 1.75 ± 0.185
A_red[6] : 3.06e-21 ± 8.18e-22 alpha_red[6] : 1.78 ± 0.185
A_red[7] : 3.09e-21 ± 8.07e-22 alpha_red[7] : 1.74 ± 0.168
A_red[8] : 2.75e-21 ± 7.89e-22 alpha_red[8] : 1.74 ± 0.177
A_red[9] : 2.87e-21 ± 7.89e-22 alpha_red[9] : 1.69 ± 0.171
A_red[10] : 2.91e-21 ± 7.55e-22 alpha_red[10] : 1.74 ± 0.163
A_red[11] : 2.82e-21 ± 7.79e-22 alpha_red[11] : 1.71 ± 0.166
A_red[12] : 2.84e-21 ± 7.79e-22 alpha_red[12] : 1.77 ± 0.174
A_red[13] : 3.02e-21 ± 7.62e-22 alpha_red[13] : 1.7 ± 0.177
A_red[14] : 2.98e-21 ± 7.41e-22 alpha_red[14] : 1.72 ± 0.17
A_red[15] : 3.11e-21 ± 7.61e-22 alpha_red[15] : 1.79 ± 0.175
A_red[16] : 2.87e-21 ± 7.82e-22 alpha_red[16] : 1.72 ± 0.147
A_red[17] : 2.87e-21 ± 7.84e-22 alpha_red[17] : 1.76 ± 0.175
A_red[18] : 3.01e-21 ± 7.22e-22 alpha_red[18] : 1.67 ± 0.12
A_red[19] : 2.72e-21 ± 7.18e-22 alpha_red[19] : 1.74 ± 0.172
A_red[20] : 2.67e-21 ± 7.52e-22 alpha_red[20] : 1.74 ± 0.18
A_red[21] : 2.78e-21 ± 7.3e-22 alpha_red[21] : 1.74 ± 0.156
A_red[22] : 2.92e-21 ± 7.51e-22 alpha_red[22] : 1.74 ± 0.154
A_red[23] : 2.88e-21 ± 7.51e-22 alpha_red[23] : 1.71 ± 0.144
A_red[24] : 2.7e-21 ± 6.96e-22 alpha_red[24] : 1.76 ± 0.167
A_red[25] : 2.92e-21 ± 7.97e-22 alpha_red[25] : 1.73 ± 0.146
A_red[26] : 2.8e-21 ± 7.68e-22 alpha_red[26] : 1.73 ± 0.146
A_red[27] : 2.92e-21 ± 7.14e-22 alpha_red[27] : 1.66 ± 0.121
A_red[28] : 2.9e-21 ± 7.45e-22 alpha_red[28] : 1.74 ± 0.146
A_red[29] : 2.85e-21 ± 8e-22 alpha_red[29] : 1.76 ± 0.173
A_red[30] : 2.88e-21 ± 7.05e-22 alpha_red[30] : 1.63 ± 0.115
A_red[31] : 2.86e-21 ± 7.51e-22 alpha_red[31] : 1.67 ± 0.163
A_red[32] : 2.94e-21 ± 8.01e-22 alpha_red[32] : 1.75 ± 0.165
A_red[33] : 2.86e-21 ± 7.96e-22 alpha_red[33] : 1.77 ± 0.173
A_red[34] : 2.81e-21 ± 7.55e-22 alpha_red[34] : 1.71 ± 0.176
A_red[35] : 2.87e-21 ± 6.97e-22 alpha_red[35] : 1.7 ± 0.176
A_red[36] : 2.91e-21 ± 7.4e-22 alpha_red[36] : 1.7 ± 0.163
```

Figures: histograms, chains, estimator convergence, red-noise parameters, red-noise parameter convergence

Population statistics for GW parameters and individual red-noise parameters for all pulsars:

```
[1000 emcee walkers, statistics step 1000/1000]
A_gw : 1.21e-14 ± 2.8e-15 alpha_gw : -0.283 ± 0.138
A_red[mean] : 2.94e-21 ± 3.67e-22 alpha_red[mean]: 1.62 ± 0.0729
A_red[1] : 2.65e-21 ± 1.43e-21 alpha_red[1] : 1.63 ± 0.201
A_red[2] : 3.27e-21 ± 1.36e-21 alpha_red[2] : 1.79 ± 0.27
A_red[3] : 2.47e-21 ± 1.6e-21 alpha_red[3] : 1.52 ± 0.137
A_red[4] : 2.76e-21 ± 1.49e-21 alpha_red[4] : 1.58 ± 0.225
A_red[5] : 3.59e-21 ± 1.66e-21 alpha_red[5] : 1.64 ± 0.271
A_red[6] : 3.29e-21 ± 1.37e-21 alpha_red[6] : 1.69 ± 0.221
A_red[7] : 2.56e-21 ± 1.37e-21 alpha_red[7] : 1.67 ± 0.229
A_red[8] : 2.58e-21 ± 1.48e-21 alpha_red[8] : 1.71 ± 0.227
A_red[9] : 3.02e-21 ± 1.52e-21 alpha_red[9] : 1.59 ± 0.206
A_red[10] : 2.08e-21 ± 1.31e-21 alpha_red[10] : 1.67 ± 0.173
A_red[11] : 3.36e-21 ± 1.57e-21 alpha_red[11] : 1.66 ± 0.161
A_red[12] : 3.11e-21 ± 1.59e-21 alpha_red[12] : 1.68 ± 0.274
A_red[13] : 3.2e-21 ± 1.51e-21 alpha_red[13] : 1.68 ± 0.216
A_red[14] : 2.99e-21 ± 1.43e-21 alpha_red[14] : 1.51 ± 0.204
A_red[15] : 3.17e-21 ± 1.47e-21 alpha_red[15] : 1.62 ± 0.245
A_red[16] : 2.09e-21 ± 1.19e-21 alpha_red[16] : 1.58 ± 0.211
A_red[17] : 3.2e-21 ± 1.78e-21 alpha_red[17] : 1.63 ± 0.198
A_red[18] : 2.76e-21 ± 1.58e-21 alpha_red[18] : 1.52 ± 0.163
A_red[19] : 3.24e-21 ± 1.39e-21 alpha_red[19] : 1.55 ± 0.243
A_red[20] : 2.76e-21 ± 1.48e-21 alpha_red[20] : 1.63 ± 0.266
A_red[21] : 3.23e-21 ± 1.5e-21 alpha_red[21] : 1.47 ± 0.209
A_red[22] : 2.73e-21 ± 1.45e-21 alpha_red[22] : 1.62 ± 0.188
A_red[23] : 3.2e-21 ± 1.54e-21 alpha_red[23] : 1.56 ± 0.172
A_red[24] : 2.82e-21 ± 1.36e-21 alpha_red[24] : 1.7 ± 0.226
A_red[25] : 3.36e-21 ± 1.7e-21 alpha_red[25] : 1.61 ± 0.162
A_red[26] : 2.27e-21 ± 1.25e-21 alpha_red[26] : 1.59 ± 0.209
A_red[27] : 3.19e-21 ± 1.4e-21 alpha_red[27] : 1.54 ± 0.133
A_red[28] : 3.24e-21 ± 1.44e-21 alpha_red[28] : 1.54 ± 0.201
A_red[29] : 3.04e-21 ± 1.39e-21 alpha_red[29] : 1.68 ± 0.231
A_red[30] : 3.21e-21 ± 1.5e-21 alpha_red[30] : 1.55 ± 0.132
A_red[31] : 2.83e-21 ± 1.41e-21 alpha_red[31] : 1.44 ± 0.223
A_red[32] : 2.83e-21 ± 1.43e-21 alpha_red[32] : 1.67 ± 0.207
A_red[33] : 3.02e-21 ± 1.52e-21 alpha_red[33] : 1.67 ± 0.243
A_red[34] : 3.42e-21 ± 1.43e-21 alpha_red[34] : 1.65 ± 0.222
A_red[35] : 2.95e-21 ± 1.58e-21 alpha_red[35] : 1.66 ± 0.204
A_red[36] : 2.46e-21 ± 1.39e-21 alpha_red[36] : 1.64 ± 0.172
```

Figures: histograms, chains, estimator convergence, red-noise parameters, red-noise parameter convergence

We have also attempted a run that included EFAC parameters (i.e., TOA-error multiplier) for all pulsars, with uniform prior range [-1,1] for each log_10(EFAC_i). Although we did not push MCMC integration to full convergence, the results are consistent with EFAC = 1 for all pulsars.

We estimated GWB and pulsar parameters using Markov-Chain Monte Carlo integration with time-domain covariance-matrix evaluation of likelihoods, following van Haasteren et al. [2009, 2009MNRAS.395.1005V], automatically marginalizing over timing-solution uncertainties. The integration was run with Goodman and Weare [2010, 10.2140/camcos.2010.5.65] affine-invariant ensemble sampler, as implemented in Foreman-Mackey et al.'s parallel `emcee`

[2012, 2012arXiv1202.3665F]. `tempo2`

(version `cvs-2012-4`

) was used to generate residuals and to dump the design matrix for the timing-solution fit, using the `DESIGNMATRIX`

plugin. Covariance-matrix and likelihood routines were coded in Python, using `numpy`

and `scipy`

vector and linear-algebra libraries, and `multiprocessing`

for multi-core parallelization. The ensemble sampler evolves a population of walkers (typically 1000), and we used the last-step population to approximate the posterior parameter distributions. Chain convergence was assessed by plotting chains, integrated parameters, and by evaluating autocorrelation times.

On Oct 19 we will release the full code used to generate these results.

The simulations ran on ten 2.93-GHz Intel Xeon cores (on OS X 10.6). One step for 1000 walkers took approximately 20 minutes (12 s-core/likelihood), so our longest run lasted approximately 1100 CPU hours. The code could be easily ported to a bigger cluster and run much faster there.

The per-likelihood cost is dominated by the Cholesky factorization of the projected covariance matrix, which scales as O(M^3 N^3), with M the number of pulsars and N the (roughly, average) number of TOAs/pulsar. Formally, the number of likelihood evaluations needed in MCMC integration is independent of the number of model parameters, but additional parameters (such as those that describe continuous GWs superimposed on the GW background) *may* slow down the burn-in and convergence of the chains, but probably not by much.

The greatest difficulty lies in defining the inference problem, since whatever is included in the model will be present in the solution. For instance, not all pulsars show red timing noise, so it would be appropriate to include it only for those that do (and that judgement should probably be made "outside" the problem); Bayesian model comparison may be appropriate in some cases. Furthermore, this approach depends on the strong assumption that TOA error estimates are correct, and that the data are not contamined by any systematic effects not included in the model. Non-stochastic-background GW signals may also have affected results.

The analysis also generated the following concerns about applying the code to real pulsar data:

*Pulsar red noise*. As seen in Dataset 1, including the pulsar red noise terms in the parameter fits resulted in a detection with larger error on the GW terms than in the non-GW case. More importantly, a significant difference was detected in the GWB spectrum in the two cases---enough to make the difference between "consistent with the expected spectrum of SMBH binaries" and "not consistent with any predicted sources". As previously stated, it would be pertinent to pre-judge whether red noise is actually present in a pulsar, and include it then. While this method benefits from not requiring pre-whitened data to achieve a valid solution, as other methods do, having an independent characterization of intrinsic pulsar red noise would result in greater confidence in the detection, as with all methods.*Interpretation of GWB detection and confidence in detected parameters*. Similarly to the previous point, because we search for a power-law signal within expected parameter boundaries, we either will or won't find*a power-law signal*. If a power-law signal is detected, it would be desirable to rigorously support the claim that this model is favored statistically compared to, say, a broken power-law or some more complicated generic spectral dependence (which may still trigger a detection in the power-law search space). For instance, in Datasets 1 and 3, we detected an alpha_gw that is flatter than expected for all signal models. In real data, one might question whether the GWB spectrum is in fact as detected, or whether the presence of deterministic signals (bursts, continuous waves) or other correlated signals (clock error, SS ephemerides) have induced the detection of a flatter spectrum. These differences may vastly change the interpretation of the signal, and we therefore require further work to develop statistical or data-based methods to substantiate assumptions about the signal model.

We thought the instructions were rather succinct in defining the content of the data files. More detail on the correct usage of `tempo2`

would also have been welcome.

An open-code challenge (*à la* Mock LISA Data Challenges), where the dataset-generating code is made available from the start, would also be interesting, since it minimizes problems with conventions, code bugs, and other technical aspects, and it allows participants to concentrate on the algorithms.

A closed-code challenge such as this one emphasizes the problem of formulating a search, but it may have been useful to define the search space at least coarsely (e.g., "the data may contain a background of astrophysical or fundamental origin; zero, one, or more continuous signals from individual foreground MBH binaries; and zero, one, or more bursts from cosmic-string cusps.)

© M. Vallisneri 2014 — last modified on 2012/10/17