I spent a quick couple of days last week at the The Controversy about Hypothesis Testing meeting in Madrid.

The topic of the meeting was indeed the question of “hypothesis testing”, which I addressed in a post a few months ago: how do you choose between conflicting interpretations of data? The canonical version of this question was the test of Einstein’s theory of relativity in the early 20th Century — did the observations of the advance of the perihelion of Mercury (and eventually of the gravitational lensing of starlight by the sun) match the predictions of Einstein’s theory better than Newton’s? And of course there are cases in which even more than a scientific theory is riding on the outcome: is a given treatment effective? I won’t rehash here my opinions on the subject, except to say that I think there really is a controversy: the purported Bayesian solution runs into problems in realistic cases of hypotheses about which we would like to claim some sort of “ignorance” (always a dangerous word in Bayesian circles), while the orthodox frequentist way of looking at the problem is certainly ad hoc and possibly incoherent, but nonetheless seems to work in many cases.

Sometimes, the technical worries don’t apply, and the Bayesian formalism provides the ideal solution. For example, my colleague Daniel Mortlock has applied the model-comparison formalism to deciding whether objects in his UKIDSS survey data are more likely to be distant quasars or nearby and less interesting objects. (He discussed his method here a few months ago.)

In between thoughts about hypothesis testing, I experienced the cultural differences between the statistics community and us astrophysicists and cosmologists, of which I was the only example at the meeting: a typical statistics talk just presents pages of text and equations with the occasional poorly-labeled graph thrown in. My talks tend to be a bit heavier on the presentation aspects, perhaps inevitably so given the sometimes beautiful pictures that package our data.

On the other hand, it was clear that the statisticians take their Q&A sessions very seriously, prodded in this case by the word “controversy” in the conference’s title. In his opening keynote, Jose Bernardo up from Valencia for the meeting discussed his work as a so-called “Objective Bayesian”, prompting a question from the mathematically-oriented philosopher Deborah Mayo. Mayo is an arch-frequentist (and blogger) who prefers to describe her particular version as “Error Statistics”, concerned (if I understand correctly after our wine-fuelled discussion at the conference dinner) with the use of probability and statistics to criticise the errors we make in our methods, in contrast with the Bayesian view of probability as a description of our possible knowledge of the world. These two points of view are sufficiently far apart that Bernardo countered one of the questions with the almost-rude but definitely entertaining riposte “You are bloody inconsistent — you are not mathematicians.” That was probably the most explicit almost-personal attack of the meeting, but there were similar exchanges. Not mine, though: my talk was a little more didactic than most, as I knew that I had to justify the science as well as the statistics that lurks behind any analysis of data.

So I spent much of my talk discussing the basics of modern cosmology, and applying my preferred Bayesian techniques in at least one big-picture case where the method works: choosing amongst the simple set of models that seem to describe the Universe, at least from those that obey General Relativity and the Cosmological Principle, in which we do not occupy a privileged position and which, given our observations, are therefore homogeneous and isotropic on the largest scales. Given those constraints, all we need to specify (or measure) are the amounts of the various constituents in the universe: the total amount of matter and of dark energy. The sum of these, in turn, determines the overall geometry of the universe. In the appropriate units, if the total is one, the universe is flat; if it’s larger, the universe is closed, shaped like a three-dimensional sphere; if smaller, it’s a three-dimensional hyperboloid or saddle. What we find when we make the measurement is that the amount of matter is about 0.282±0.02, and of dark energy about 0.723±0.02. Of course, these add up to just greater than one; model-selection (or hypothesis testing in other forms) allows us to say that the data nonetheless give us reason to prefer the flat Universe despite the small discrepancy.

After the meeting, I had a couple of hours free, so I went across Madrid to the Reina Sofia, to stand amongst the Picassos and Serras. And I was lucky enough to have my hotel room above a different museum:

## Mayo

Glad to hear of your blog. I think one of the big problems in capturing the experimental testing of GTR Bayesianly is the simple fact that scientists did not and do not have an exhaustive set of alternative gravity theories, much less do they find it profitable to try to assign a degree of belief in a "catchall hypothesis". But they do split things up piecemeal to delineate parameters and probe them precisely along the lines of frequentist confidence intervals and significance tests. This lets them understand the phenomenon, set upper bounds to how much a viable theory can differ from GTR in given respects, and design better tests.

I don't know in what way you are claiming we error statisticians are incoherent, by the way.

As for Bernardo, you are correct in saying his attack (on me) was personal and vehement (I actually didn't hear the words you wrote about not being a bloody mathematician or whatever he said). He failed to answer my question, and perhaps he did not realize that the audience's snickering at his overlong harangue was not a sign that they were taking him seriously. Too bad!