Opening the box

This post is a work in progress, but I’ve decided to post it in its unfinished state. Comments and questions welcome!

This week I went to a seminar on the new results from the MiniBooNE experiment given here at Imperial by Morgan Wascko.

The MiniBooNE results have been discussed in depth elsewhere. Like MINOS last year, MiniBooNE was looking at the masses of neutrinos. Specifically, it was looking for the oscillation between electron neutrinos and mu neutrinos. A decade ago, the LSND experiment saw events indicating that mu antineutrinos could oscillate into electron antineutrinos, which gave evidence of a mass difference between the two “flavors”. Unfortunately, this evidence was at odds with the results of two other neutrino oscillation experiments, at least in the standard model with three (but only three) different flavors. MiniBooNE set out to test these results. The results so far seem to contradict those from LSND (at about “98% confidence”).

But here I want to talk about a specific aspect of the statistical methods that MiniBooNE (and many other modern particle physics experiments). How did they come up with that 98% number? Over the last couple of decades, the particle physics has arrived at what it considers a pretty rigorous set of methods. It relies on two chief tenets. First, make sure you can simulate every single aspect of your experiment, varying all of the things that you don’t know for sure (aspects of nuclear physics, the properties of the detectors, and, of course, the unknown physics). Compare these simulations to your data in order to “tune” those numbers to their (assumedly) actual values. Finally, delay looking at the part of your data that contains the actual signal until the very end of the process. Putting these all together means that you can do a “blind analysis” only “opening the box” at that final stage.

Why do they go through all of this trouble? Basically, to avoid what is technically known as “bias” — the scary truth that we scientists can’t be trusted to be completely rational. If you look at your data while you’re trying to understand everything about your experiment, you’re likely to stop adjusting all the parameters when you get an answer that seems right, that matches your underlying prejudices. (Something like this is a well known problem in the medical world: “publication bias” in which only successful studies for a given treatment ever see the light of day.)

Even with the rigorous controls of a blind analysis, the MiniBooNE experimenters have still had to intervene in the process more than they would have liked: they adjusted the lower-limit of the particle energies that they analyzed in order to remove an anomalous discrepancy with expectations in their simulations. To be fair, the analysis was still blind, but it had the effect of removing an excess of events at the now-discarded low energies. This excess doesn’t look anything like the signal for which they were searching — and it occurs in a regime where you might have less confidence in the experimental results, but it does need to be understood. (Indeed, unlike the official talks which attempt to play down this anomaly, the front-page NY Times article on the experiment highlights it.)

Particle physicists can do this because they are in the lucky (and expensive) position of building absolutely everything about their experimental setup: the accelerators that create their particles, the targets that they aim them at, and the detectors that track the detritus of the collisions, all lovingly and carefully crafted out of the finest of materials. We astrophysicists don’t have the same luxury: we may build the telescope, but everything else is out there in the heavens, out of our control. Moreover, particle experiments enjoy a surfeit of data — billions of events that don’t give information about the desired physical signal, but do let us calibrate the experiment itself. In astrophysics, we often have to use the very same data to calibrate as we use to measure the physics we’re interested in. (Cosmic Microwave Background experiments are a great example of this: it’s very difficult to get lots of good data on our detectors’ performance except in situ.)

It also happens that the dominant statistical ideology in the particle physics community is “frequentist”, in contrast to the Bayesian methods that I never shut up about. Part of the reason for the difference is purely practical: frequentist methods make sense when you can perform the needed “Monte Carlo simulations” of your entire experiment, varying all of the unknowns, and tune your methods against the experimental results. In astrophysics, and especially in cosmology, this is more difficult: there is only one Universe (at least only one that we can measure). But there would be nothing to stop us from doing a blind analysis, simultaneously measuring — and, in the parlance of the trade, marginalizing over — the parameters that describe our experiment that are “tuned” in the Monte Carlo analysis. Indeed, the particle physics community, were it to see the Bayesian light and truth, could in principle do this, too. The problem is simply that this would be a much more computationally difficult task.