Results matching “bayes”

(Almost) The end of Planck

This week, we released (most of) the final set of papers from the Planck collaboration — the long-awaited Planck 2018 results (which were originally meant to be the “Planck 2016 results”, but everything takes longer than you hope…), available on the ESA website as well as the arXiv. More importantly for many astrophysicists and cosmologists, the final public release of Planck data is also available.

Anyway, we aren’t quite finished: those of you up on your roman numerals will notice that there are only 9 papers but the last one is “XII” — the rest of the papers will come out over the coming months. So it’s not the end, but at least it’s the beginning of the end.

And it’s been a long time coming. I attended my first Planck-related meeting in 2000 or so (and plenty of people had been working on the projects that would become Planck for a half-decade by that point). For the last year or more, the number of people working on Planck has dwindled as grant money has dried up (most of the scientists now analysing the data are doing so without direct funding for the work).

(I won’t rehash the scientific and technical background to the Planck Satellite and the cosmic microwave background (CMB), which I’ve been writing about for most of the lifetime of this blog.)

Planck 2018: the science

So, in the language of the title of the first paper in the series, what is the legacy of Planck? The state of our science is strong. For the first time, we present full results from both the temperature of the CMB and its polarization. Unfortunately, we don’t actually use all the data available to us — on the largest angular scales, Planck’s results remain contaminated by astrophysical foregrounds and unknown “systematic” errors. This is especially true of our measurements of the polarization of the CMB, unfortunately, which is probably Planck’s most significant limitation.

The remaining data are an excellent match for what is becoming the standard model of cosmology: ΛCDM, or “Lambda-Cold Dark Matter”, which is dominated, first, by a component which makes the Universe accelerate in its expansion (Λ, Greek Lambda), usually thought to be Einstein’s cosmological constant; and secondarily by an invisible component that seems to interact only by gravity (CDM, or “cold dark matter”). We have tested for more exotic versions of both of these components, but the simplest model seems to fit the data without needing any such extensions. We also observe the atoms and light which comprise the more prosaic kinds of matter we observe in our day-to-day lives, which make up only a few percent of the Universe.

All together, the sum of the densities of these components are just enough to make the curvature of the Universe exactly flat through Einstein’s General Relativity and its famous relationship between the amount of stuff (mass) and the geometry of space-time. Furthermore, we can measure the way the matter in the Universe is distributed as a function of the length scale of the structures involved. All of these are consistent with the predictions of the famous or infamous theory of cosmic inflation), which expanded the Universe when it was much less than one second old by factors of more than 1020. This made the Universe appear flat (think of zooming into a curved surface) and expanded the tiny random fluctuations of quantum mechanics so quickly and so much that they eventually became the galaxies and clusters of galaxies we observe today. (Unfortunately, we still haven’t observed the long-awaited primordial B-mode polarization that would be a somewhat direct signature of inflation, although the combination of data from Planck and BICEP2/Keck give the strongest constraint to date.)

Most of these results are encoded in a function called the CMB power spectrum, something I’ve shown here on the blog a few times before, but I never tire of the beautiful agreement between theory and experiment, so I’ll do it again: PlanckSpectra (The figure is from the Planck “legacy” paper; more details are in others in the 2018 series, especially the Planck “cosmological parameters” paper.) The top panel gives the power spectrum for the Planck temperature data, the second panel the cross-correlation between temperature and the so-called E-mode polarization, the left bottom panel the polarization-only spectrum, and the right bottom the spectrum from the gravitational lensing of CMB photons due to matter along the line of sight. (There are also spectra for the B mode of polarization, but Planck cannot distinguish these from zero.) The points are “one sigma” error bars, and the blue curve gives the best fit model.

As an important aside, these spectra per se are not used to determine the cosmological parameters; rather, we use a Bayesian procedure to calculate the likelihood of the parameters directly from the data. On small scales (corresponding to 𝓁>30 since 𝓁 is related to the inverse of an angular distance), estimates of spectra from individual detectors are used as an approximation to the proper Bayesian formula; on large scales (𝓁<30) we use a more complicated likelihood function, calculated somewhat differently for data from Planck’s High- and Low-frequency instruments, which captures more of the details of the full Bayesian procedure (although, as noted above, we don’t use all possible combinations of polarization and temperature data to avoid contamination by foregrounds and unaccounted-for sources of noise).

Of course, not all cosmological data, from Planck and elsewhere, seem to agree completely with the theory. Perhaps most famously, local measurements of how fast the Universe is expanding today — the Hubble constant — give a value of H0 = (73.52 ± 1.62) km/s/Mpc (the units give how much faster something is moving away from us in km/s as they get further away, measured in megaparsecs (Mpc); whereas Planck (which infers the value within a constrained model) gives (67.27 ± 0.60) km/s/Mpc . This is a pretty significant discrepancy and, unfortunately, it seems difficult to find an interesting cosmological effect that could be responsible for these differences. Rather, we are forced to expect that it is due to one or more of the experiments having some unaccounted-for source of error.

The term of art for these discrepancies is “tension” and indeed there are a few other “tensions” between Planck and other datasets, as well as within the Planck data itself: weak gravitational lensing measurements of the distortion of light rays due to the clustering of matter in the relatively nearby Universe show evidence for slightly weaker clustering than that inferred from Planck data. There are tensions even within Planck, when we measure the same quantities by different means (including things related to similar gravitational lensing effects). But, just as “half of all three-sigma results are wrong”, we expect that we’ve mis- or under-estimated (or to quote the no-longer-in-the-running-for-the-worst president ever, “misunderestimated”) our errors much or all of the time and should really learn to expect this sort of thing. Some may turn out to be real, but many will be statistical flukes or systematic experimental errors.

(If you were looking a briefer but more technical fly-through the Planck results — from someone not on the Planck team — check out Renee Hlozek’s tweetstorm.)

Planck 2018: lessons learned

So, Planck has more or less lived up to its advanced billing as providing definitive measurements of the cosmological parameters, while still leaving enough “tensions” and other open questions to keep us cosmologists working for decades to come (we are already planning the next generation of ground-based telescopes and satellites for measuring the CMB).

But did we do things in the best possible way? Almost certainly not. My colleague (and former grad student!) Joe Zuntz has pointed out that we don’t use any explicit “blinding” in our statistical analysis. The point is to avoid our own biases when doing an analysis: you don’t want to stop looking for sources of error when you agree with the model you thought would be true. This works really well when you can enumerate all of your sources of error and then simulate them. In practice, most collaborations (such as the Polarbear team with whom I also work) choose to un-blind some results exactly to be able to find such sources of error, and indeed this is the motivation behind the scores of “null tests” that we run on different combinations of Planck data. We discuss this a little in an appendix of the “legacy” paper — null tests are important, but we have often found that a fully blind procedure isn’t powerful enough to find all sources of error, and in many cases (including some motivated by external scientists looking at Planck data) it was exactly low-level discrepancies within the processed results that have led us to new systematic effects. A more fully-blind procedure would be preferable, of course, but I hope this is a case of the great being the enemy of the good (or good enough). I suspect that those next-generation CMB experiments will incorporate blinding from the beginning.

Further, although we have released a lot of software and data to the community, it would be very difficult to reproduce all of our results. Nowadays, experiments are moving toward a fully open-source model, where all the software is publicly available (in Planck, not all of our analysis software was available to other members of the collaboration, much less to the community at large). This does impose an extra burden on the scientists, but it is probably worth the effort, and again, needs to be built into the collaboration’s policies from the start.

That’s the science and methodology. But Planck is also important as having been one of the first of what is now pretty standard in astrophysics: a collaboration of many hundreds of scientists (and many hundreds more of engineers, administrators, and others without whom Planck would not have been possible). In the end, we persisted, and persevered, and did some great science. But I learned that scientists need to learn to be better at communicating, both from the top of the organisation down, and from the “bottom” (I hesitate to use that word, since that is where much of the real work is done) up, especially when those lines of hoped-for communication are usually between different labs or Universities, very often between different countries. Physicists, I have learned, can be pretty bad at managing — and at being managed. This isn’t a great combination, and I say this as a middle-manager in the Planck organisation, very much guilty on both fronts.

WMAP Breaks Through

It was announced this morning that the WMAP team has won the $3 million Breakthrough Prize. Unlike the Nobel Prize, which infamously is only awarded to three people each year, the Breakthrough Prize was awarded to the whole 27-member WMAP team, led by Chuck Bennett, Gary Hinshaw, Norm Jarosik, Lyman Page, and David Spergel, but including everyone through postdocs and grad students who worked on the project. This is great, and I am happy to send my hearty congratulations to all of them (many of whom I know well and am lucky to count as friends).

I actually knew about the prize last week as I was interviewed by Nature for an article about it. Luckily I didn’t have to keep the secret for long. Although I admit to a little envy, it’s hard to argue that the prize wasn’t deserved. WMAP was ideally placed to solidify the current standard model of cosmology, a Universe dominated by dark matter and dark energy, with strong indications that there was a period of cosmological inflation at very early times, which had several important observational consequences. First, it made the geometry of the Universe — as described by Einstein’s theory of general relativity, which links the contents of the Universe with its shape — flat. Second, it generated the tiny initial seeds which eventually grew into the galaxies that we observe in the Universe today (and the stars and planets within them, of course).

By the time WMAP released its first results in 2003, a series of earlier experiments (including MAXIMA and BOOMERanG, which I had the privilege of being part of) had gone much of the way toward this standard model. Indeed, about ten years one of my Imperial colleagues, Carlo Contaldi, and I wanted to make that comparison explicit, so we used what were then considered fancy Bayesian sampling techniques to combine the data from balloons and ground-based telescopes (which are collectively known as “sub-orbital” experiments) and compare the results to WMAP. We got a plot like the following (which we never published), showing the main quantity that these CMB experiments measure, called the power spectrum (which I’ve discussed in a little more detail here). The horizontal axis corresponds to the size of structures in the map (actually, its inverse, so smaller is to the right) and the vertical axis to how large the the signal is on those scales.

Grand unified spectrum

As you can see, the suborbital experiments, en masse, had data at least as good as WMAP on most scales except the very largest (leftmost; this is because you really do need a satellite to see the entire sky) and indeed were able to probe smaller scales than WMAP (to the right). Since then, I’ve had the further privilege of being part of the Planck Satellite team, whose work has superseded all of these, giving much more precise measurements over all of these scales: PlanckCl

Am I jealous? Ok, a little bit.

But it’s also true, perhaps for entirely sociological reasons, that the community is more apt to trust results from a single, monolithic, very expensive satellite than an ensemble of results from a heterogeneous set of balloons and telescopes, run on (comparative!) shoestrings. On the other hand, the overall agreement amongst those experiments, and between them and WMAP, is remarkable.

And that agreement remains remarkable, even if much of the effort of the cosmology community is devoted to understanding the small but significant differences that remain, especially between one monolithic and expensive satellite (WMAP) and another (Planck). Indeed, those “real and serious” (to quote myself) differences would be hard to see even if I plotted them on the same graph. But since both are ostensibly measuring exactly the same thing (the CMB sky), any differences — even those much smaller than the error bars — must be accounted for almost certainly boil down to differences in the analyses or misunderstanding of each team’s own data. Somewhat more interesting are differences between CMB results and measurements of cosmology from other, very different, methods, but that’s a story for another day.

The first direct detection of gravitational waves was announced in February of 2015 by the LIGO team, after decades of planning, building and refining their beautiful experiment. Since that time, the US-based LIGO has been joined by the European Virgo gravitational wave telescope (and more are planned around the globe).

The first four events that the teams announced were from the spiralling in and eventual mergers of pairs of black holes, with masses ranging from about seven to about forty times the mass of the sun. These masses are perhaps a bit higher than we expect to by typical, which might raise intriguing questions about how such black holes were formed and evolved, although even comparing the results to the predictions is a hard problem depending on the details of the statistical properties of the detectors and the astrophysical models for the evolution of black holes and the stars from which (we think) they formed.

Last week, the teams announced the detection of a very different kind of event, the collision of two neutron stars, each about 1.4 times the mass of the sun. Neutron stars are one possible end state of the evolution of a star, when its atoms are no longer able to withstand the pressure of the gravity trying to force them together. This was first understood by S Chandrasekhar in the early years of the 20th Century, who realised that there was a limit to the mass of a star held up simply by the quantum-mechanical repulsion of the electrons at the outskirts of the atoms making up the star. When you surpass this mass, known, appropriately enough, as the Chandrasekhar mass, the star will collapse in upon itself, combining the electrons and protons into neutrons and likely releasing a vast amount of energy in the form of a supernova explosion. After the explosion, the remnant is likely to be a dense ball of neutrons, whose properties are actually determined fairly precisely by similar physics to that of the Chandrasekhar limit (discussed for this case by Oppenheimer, Volkoff and Tolman), giving us the magic 1.4 solar mass number.

(Last week also coincidentally would have seen Chandrasekhar’s 107th birthday, and Google chose to illustrate their home page with an animation in his honour for the occasion. I was a graduate student at the University of Chicago, where Chandra, as he was known, spent most of his career. Most of us students were far too intimidated to interact with him, although it was always seen as an auspicious occasion when you spotted him around the halls of the Astronomy and Astrophysics Center.)

This process can therefore make a single 1.4 solar-mass neutron star, and we can imagine that in some rare cases we can end up with two neutron stars orbiting one another. Indeed, the fact that LIGO saw one, but only one, such event during its year-and-a-half run allows the teams to constrain how often that happens, albeit with very large error bars, between 320 and 4740 events per cubic gigaparsec per year; a cubic gigaparsec is about 3 billion light-years on each side, so these are rare events indeed. These results and many other scientific inferences from this single amazing observation are reported in the teams’ overview paper.

A series of other papers discuss those results in more detail, covering the physics of neutron stars to limits on departures from Einstein’s theory of gravity (for more on some of these other topics, see this blog, or this story from the NY Times). As a cosmologist, the most exciting of the results were the use of the event as a “standard siren”, an object whose gravitational wave properties are well-enough understood that we can deduce the distance to the object from the LIGO results alone. Although the idea came from Bernard Schutz in 1986, the term “Standard siren” was coined somewhat later (by Sean Carroll) in analogy to the (heretofore?) more common cosmological standard candles and standard rulers: objects whose intrinsic brightness and distances are known and so whose distances can be measured by observations of their apparent brightness or size, just as you can roughly deduce how far away a light bulb is by how bright it appears, or how far away a familiar object or person is by how big how it looks.

Gravitational wave events are standard sirens because our understanding of relativity is good enough that an observation of the shape of gravitational wave pattern as a function of time can tell us the properties of its source. Knowing that, we also then know the amplitude of that pattern when it was released. Over the time since then, as the gravitational waves have travelled across the Universe toward us, the amplitude has gone down (further objects look dimmer sound quieter); the expansion of the Universe also causes the frequency of the waves to decrease — this is the cosmological redshift that we observe in the spectra of distant objects’ light.

Unlike LIGO’s previous detections of binary-black-hole mergers, this new observation of a binary-neutron-star merger was also seen in photons: first as a gamma-ray burst, and then as a “nova”: a new dot of light in the sky. Indeed, the observation of the afterglow of the merger by teams of literally thousands of astronomers in gamma and x-rays, optical and infrared light, and in the radio, is one of the more amazing pieces of academic teamwork I have seen.

And these observations allowed the teams to identify the host galaxy of the original neutron stars, and to measure the redshift of its light (the lengthening of the light’s wavelength due to the movement of the galaxy away from us). It is most likely a previously unexceptional galaxy called NGC 4993, with a redshift z=0.009, putting it about 40 megaparsecs away, relatively close on cosmological scales.

But this means that we can measure all of the factors in one of the most celebrated equations in cosmology, Hubble’s law: cz=Hd, where c is the speed of light, z is the redshift just mentioned, and d is the distance measured from the gravitational wave burst itself. This just leaves H₀, the famous Hubble Constant, giving the current rate of expansion of the Universe, usually measured in kilometres per second per megaparsec. The old-fashioned way to measure this quantity is via the so-called cosmic distance ladder, bootstrapping up from nearby objects of known distance to more distant ones whose properties can only be calibrated by comparison with those more nearby. But errors accumulate in this process and we can be susceptible to the weakest rung on the chain (see recent work by some of my colleagues trying to formalise this process). Alternately, we can use data from cosmic microwave background (CMB) experiments like the Planck Satellite (see here for lots of discussion on this blog); the typical size of the CMB pattern on the sky is something very like a standard ruler. Unfortunately, it, too, needs to calibrated, implicitly by other aspects of the CMB pattern itself, and so ends up being a somewhat indirect measurement. Currently, the best cosmic-distance-ladder measurement gives something like 73.24 ± 1.74 km/sec/Mpc whereas Planck gives 67.81 ± 0.92 km/sec/Mpc; these numbers disagree by “a few sigma”, enough that it is hard to explain as simply a statistical fluctuation.

Unfortunately, the new LIGO results do not solve the problem. Because we cannot observe the inclination of the neutron-star binary (i.e., the orientation of its orbit), this blows up the error on the distance to the object, due to the Bayesian marginalisation over this unknown parameter (just as the Planck measurement requires marginalization over all of the other cosmological parameters to fully calibrate the results). Because the host galaxy is relatively nearby, the teams must also account for the fact that the redshift includes the effect not only of the cosmological expansion but also the movement of galaxies with respect to one another due to the pull of gravity on relatively large scales; this so-called peculiar velocity has to be modelled which adds further to the errors.

This procedure gives a final measurement of 70.0+12-8.0, with the full shape of the probability curve shown in the Figure, taken directly from the paper. Both the Planck and distance-ladder results are consistent with these rather large error bars. But this is calculated from a single object; as more of these events are seen these error bars will go down, typically by something like the square root of the number of events, so it might not be too long before this is the best way to measure the Hubble Constant.


[Apologies: too long, too technical, and written late at night while trying to get my wonderful not-quite-three-week-old daughter to sleep through the night.]

Knightian Uncertainty

[Update: I have fixed some broken links, and modified the discussion of QBism and the recent paper by Chris Fuchs— thanks to Chris himself for taking the time to read and find my mistakes!]

For some reason, I’ve come across an idea called “Knightian Uncertainty” quite a bit lately. Frank Knight was an economist of the free-market conservative “Chicago School”, who considered various concepts related to probability in a book called Risk, Uncertainty, and Profit. He distinguished between “risk”, which he defined as applying to events to which we can assign a numerical probability, and “uncertainty”, to those events about which we know so little that we don’t even have a probability to assign, or indeed those events whose possibility we didn’t even contemplate until they occurred. In Rumsfeldian language, “risk” applies to “known unknowns”, and “uncertainty” to “unknown unknowns”. Or, as Nicholas Taleb put it, “risk” is about “white swans”, while “uncertainty” is about those unexpected “black swans”.

(As a linguistic aside, to me, “uncertainty” seems a milder term than “risk”, and so the naming of the concepts is backwards.)

Actually, there are a couple of slightly different concepts at play here. The black swans or unknown-unknowns are events that one wouldn’t have known enough about to even include in the probabilities being assigned. This is much more severe than those events that one knows about, but for which one doesn’t have a good probability to assign.

And the important word here is “assign”. Probabilities are not something out there in nature, but in our heads. So what should a Bayesian make of these sorts of uncertainty? By definition, they can’t be used in Bayes’ theorem, which requires specifying a probability distribution. Bayesian theory is all about making models of the world: we posit a mechanism and possible outcomes, and assign probabilities to the parts of the model that we don’t know about.

So I think the two different types of Knightian uncertainty have quite a different role here. In the case where we know that some event is possible, but we don’t really know what probabilities to assign to it, we at least have a starting point. If our model is broad enough, then enough data will allow us to measure the parameters that describe it. For example, in recent years people have started to realise that the frequencies of rare, catastrophic events (financial crashes, earthquakes, etc.) are very often well described by so-called power-law distributions. These assign much greater probabilities to such events than more typical Gaussian (bell-shaped curve) distributions; the shorthand for this is that power-law distributions have much heavier tails than Gaussians. As long as our model includes the possibility of these heavy tails, we should be able to make predictions based on data, although very often those predictions won’t be very precise.

But the “black swan” problem is much worse: these are possibilities that we don’t even know enough about to consider in our model. Almost by definition, one can’t say anything at all about this sort of uncertainty. But what one must do is be open-minded enough to adjust our models in the face of new data: we can’t predict the black swan, but we should expand the model after we’ve seen the first one (and perhaps revise our model for other waterfowl to allow more varieties!). In more traditional scientific settings, involving measurements with errors, this is even more difficult: a seemingly anomalous result, not allowed in the model, may be due to some mistake in the experimental setup or in our characterisation of the probabilities of those inevitable errors (perhaps they should be described by heavy-tailed power laws, rather than Gaussian distributions as above).

I first came across the concept as an oblique reference in a recent paper by Chris Fuchs, writing about his idea of QBism (or see here for a more philosophically-oriented discussion), an interpretation of quantum mechanics that takes seriously the Bayesian principle that all probabilities are about our knowledge of the world, rather than the world itself (which is a discussion for another day). He tentatively opined that the probabilities in quantum mechanics are themselves “Knightian”, referring not to a reading of Knight himself but to some recent, and to me frankly bizarre, ideas from Scott Aaronson, discussed in his paper, The Ghost in the Quantum Turing Machine, and an accompanying blog post, trying to base something like “free will” (a term he explicitly does not apply to this idea, however) on the possibility of our brains having so-called “freebits”, quantum states whose probabilities are essentially uncorrelated with anything else in the Universe. This arises from what is to me a mistaken desire to equate “freedom” with complete unpredictability. My take on free will is instead aligned with that of Daniel Dennett, at least the version from his Consciousness Explained from the early 1990s, as I haven’t yet had the chance to read his recent From Bacteria to Bach and Back: a perfectly deterministic (or quantum mechanically random, even allowing for the statistical correlations that Aaronson wants to be rid of) version of free will is completely sensible, and indeed may be the only kind of free will worth having.

Fuchs himself tentatively uses Aaronson’s “Knightian Freedom” to refer to his own idea

that nature does what it wants, without a mechanism underneath, and without any “hidden hand” of the likes of Richard von Mises’s Kollective or Karl Popper’s propensities or David Lewis’s objective chances, or indeed any conception that would diminish the autonomy of nature’s events,

which I think is an attempt (and which I admit I don’t completely understand) to remove the probabilities of quantum mechanics entirely from any mechanistic account of physical systems, despite the incredible success of those probabilities in predicting the outcomes of experiments and other observations of quantum mechanical systems. I’m not quite sure this is what either Knight nor Aaronson had in mind with their use of “uncertainty” (or “freedom”), since at least in quantum mechanics, we do know what probabilities to assign, given certain other personal (as Fuchs would have it) information about the system. My Bayesian predilections make me sympathetic with this idea, but then I struggle to understand what, exactly, quantum mechanics has taught us about the world: why do the predictions of quantum mechanics work?

When I’m not thinking about physics, for the last year or so my mind has been occupied with politics, so I was amused to see Knightian Uncertainty crop up in a New Yorker article about Trump’s effect on the stock market:

Still, in economics there’s a famous distinction, developed by the great Chicago economist Frank Knight, between risk and uncertainty. Risk is when you don’t know exactly what will happen but nonetheless have a sense of the possibilities and their relative likelihood. Uncertainty is when you’re so unsure about the future that you have no way of calculating how likely various outcomes are. Business is betting that Trump is risky but not uncertain—he may shake things up, but he isn’t going to blow them up. What they’re not taking seriously is the possibility that Trump may be willing to do things—like start a trade war with China or a real war with Iran—whose outcomes would be truly uncertain.

It’s a pretty low bar, but we can only hope.

SOLE Survivor

I recently finished my last term lecturing our second-year Quantum Mechanics course, which I taught for five years. It’s a required class, a mathematical introduction to one of the most important set of ideas in all of physics, and really the basis for much of what we do, whether that’s astrophysics or particle physics or almost anything else. It’s a slightly “old-fashioned” course, although it covers the important basic ideas: the Schrödinger Equation, the postulates of quantum mechanics, angular momentum, and spin, leading almost up to what is needed to understand the crowning achievement of early quantum theory: the structure of the hydrogen atom (and other atoms).

A more modern approach might start with qubits: the simplest systems that show quantum mechanical behaviour, and the study of which has led to the revolution in quantum information and quantum computing.

Moreover, the lectures rely on the so-called Copenhagen interpretation, which is the confusing and sometimes contradictory way that most physicists are taught to think about the basic ontology of quantum mechanics: what it says about what the world is “made of” and what happens when you make a quantum-mechanical measurement of that world. Indeed, it’s so confusing and contradictory that you really need another rule so that you don’t complain when you start to think too deeply about it: “shut up and calculate”. A more modern approach might also discuss the many-worlds approach, and — my current favorite — the (of course) Bayesian ideas of QBism.

The students seemed pleased with the course as it is — at the end of the term, they have the chance to give us some feedback through our “Student On-Line Evaluation” system, and my marks have been pretty consistent. Of the 200 or so students in the class, only about 90 bother to give their evaluations, which is disappointingly few. But it’s enough (I hope) to get a feeling for what they thought.

SOLE 2016 Chart

So, most students Definitely/Mostly Agree with the good things, although it’s clear that our students are most disappointed in the feedback that they receive from us (this is a more general issue for us in Physics at Imperial and more generally, and which may partially explain why most of them are unwilling to feed back to us through this form).

But much more fun and occasionally revealing are the “free-text comments”. Given the numerical scores, it’s not too surprising that there were plenty of positive ones:

  • Excellent lecturer - was enthusiastic and made you want to listen and learn well. Explained theory very well and clearly and showed he responded to suggestions on how to improve.

  • Possibly the best lecturer of this term.

  • Thanks for providing me with the knowledge and top level banter.

  • One of my favourite lecturers so far, Jaffe was entertaining and cleary very knowledgeable. He was always open to answering questions, no matter how simple they may be, and gave plenty of opportunity for students to ask them during lectures. I found this highly beneficial. His lecturing style incorporates well the blackboards, projectors and speach and he finds a nice balance between them. He can be a little erratic sometimes, which can cause confusion (e.g. suddenly remembering that he forgot to write something on the board while talking about something else completely and not really explaining what he wrote to correct it), but this is only a minor fix. Overall VERY HAPPY with this lecturer!

But some were more mixed:

  • One of the best, and funniest, lecturers I’ve had. However, there are some important conclusions which are non-intuitively derived from the mathematics, which would be made clearer if they were stated explicitly, e.g. by writing them on the board.

  • I felt this was the first time I really got a strong qualitative grasp of quantum mechanics, which I certainly owe to Prof Jaffe’s awesome lectures. Sadly I can’t quite say the same about my theoretical grasp; I felt the final third of the course less accessible, particularly when tackling angular momentum. At times, I struggled to contextualise the maths on the board, especially when using new techniques or notation. I mostly managed to follow Prof Jaffe’s derivations and explanations, but struggled to understand the greater meaning. This could be improved on next year. Apart from that, I really enjoyed going to the lectures and thought Prof Jaffe did a great job!

  • The course was inevitably very difficult to follow.

And several students explicitly commented on my attempts to get students to ask questions in as public a way as possible, so that everyone can benefit from the answers and — this really is true! — because there really are no embarrassing questions!

  • Really good at explaining and very engaging. Can seem a little abrasive at times. People don’t like asking questions in lectures, and not really liking people to ask questions in private afterwards, it ultimately means that no questions really get answered. Also, not answering questions by email makes sense, but no one really uses the blackboard form, so again no one really gets any questions answered. Though the rationale behind not answering email questions makes sense, it does seem a little unnecessarily difficult.

  • We are told not to ask questions privately so that everyone can learn from our doubts/misunderstandings, but I, amongst many people, don’t have the confidence to ask a question in front of 250 people during a lecture.

  • Forcing people to ask questions in lectures or publically on a message board is inappropriate. I understand it makes less work for you, but many students do not have the confidence to ask so openly, you are discouraging them from clarifying their understanding.

Inevitably, some of the comments were contradictory:

  • Would have been helpful to go through examples in lectures rather than going over the long-winded maths to derive equations/relationships that are already in the notes.

  • Professor Jaffe is very good at explaining the material. I really enjoyed his lectures. It was good that the important mathematics was covered in the lectures, with the bulk of the algebra that did not contribute to understanding being left to the handouts. This ensured we did not get bogged down in unnecessary mathematics and that there was more emphasis on the physics. I liked how Professor Jaffe would sometimes guide us through the important physics behind the mathematics. That made sure I did not get lost in the maths. A great lecture course!

And also inevitably, some students wanted to know more about the exam:

  • It is a difficult module, however well covered. The large amount of content (between lecture notes and handouts) is useful. Could you please identify what is examinable though as it is currently unclear and I would like to focus my time appropriately?

And one comment was particularly worrying (along with my seeming “a little abrasive at times”, above):

  • The lecturer was really good in lectures. however, during office hours he was a bit arrogant and did not approach the student nicely, in contrast to the behaviour of all the other professors I have spoken to

If any of the students are reading this, and are willing to comment further on this, I’d love to know more — I definitely don’t want to seem (or be!) arrogant or abrasive.

But I’m happy to see that most students don’t seem to think so, and even happier to have learned that I’ve been nominated “multiple times” for Imperial’s Student Academic Choice Awards!

Finally, best of luck to my colleague Jonathan Pritchard, who will be taking over teaching the course next year.

Academic Blogging Still Dangerous?

Nearly a decade ago, blogging was young, and its place in the academic world wasn’t clear. Back in 2005, I wrote about an anonymous article in the Chronicle of Higher Education, a so-called “advice” column admonishing academic job seekers to avoid blogging, mostly because it let the hiring committee find out things that had nothing whatever to do with their academic job, and reject them on those (inappropriate) grounds.

I thought things had changed. Many academics have blogs, and indeed many institutions encourage it (here at Imperial, there’s a College-wide list of blogs written by people at all levels, and I’ve helped teach a course on blogging for young academics). More generally, outreach has become an important component of academic life (that is, it’s at least necessary to pay it lip service when applying for funding or promotions) and blogging is usually seen as a useful way to reach a wide audience outside of one’s field.

So I was distressed to see the lament — from an academic blogger — “Want an academic job? Hold your tongue”. Things haven’t changed as much as I thought:

… [A senior academic said that] the blog, while it was to be commended for its forthright tone, was so informal and laced with profanity that the professor could not help but hold the blog against the potential faculty member…. It was the consensus that aspiring young scientists should steer clear of such activities.

Depending on the content of the blog in question, this seems somewhere between a disregard for academic freedom and a judgment of the candidate on completely irrelevant grounds. Of course, it is natural to want the personalities of our colleagues to mesh well with our own, and almost impossible to completely ignore supposedly extraneous information. But we are hiring for academic jobs, and what should matter are research and teaching ability.

Of course, I’ve been lucky: I already had a permanent job when I started blogging, and I work in the UK system which doesn’t have a tenure review process. And I admit this blog has steered clear of truly controversial topics (depending on what you think of Bayesian probability, at least).

Planck 2013: the science

If you’re the kind of person who reads this blog, then you won’t have missed yesterday’s announcement of the first Planck cosmology results.

The most important is our picture of the cosmic microwave background itself: Planck CMB node full image

But it takes a lot of work to go from the data coming off the Planck satellite to this picture. First, we have to make nine different maps, one at each of the frequencies in which Planck observes, from 30 GHz (with a wavelength of 1 cm) up to 850 GHz (0.350 mm) — note that the colour scales here are the same:


At low and high frequencies, these are dominated by the emission of our own galaxy, and there is at least some contamination over the whole range, so it takes hard work to separate the primordial CMB signal from the dirty (but interesting) astrophysics along the way. In fact, it’s sufficiently challenging that the team uses four different methods, each with different assumptions, to do so, and the results agree remarkably well.

In fact, we don’t use the above CMB image directly to do the main cosmological science. Instead, we build a Bayesian model of the data, combining our understanding of the foreground astrophysics and the cosmology, and marginalise over the astrophysical parameters in order to extract as much cosmological information as we can. (The formalism is described in the Planck likelihood paper, and the main results of the analysis are in the Planck cosmological parameters paper.)

The main tool for this is the power spectrum, a plot which shows us how the different hot and cold spots on our CMB map are distributed: PlanckCl In this plot, the left-hand side (low ℓ) corresponds to large angles on the sky and high ℓ to small angles. Planck’s results are remarkable for covering this whole range from ℓ=2 to ℓ=2500: the previous CMB satellite, WMAP, had a high-quality spectrum out to ℓ=750 or so; ground- and balloon-based experiments like SPT and ACT filled in some of the high-ℓ regime.

It’s worth marvelling at this for a moment, a triumph of modern cosmological theory and observation: our theoretical models fit our data from scales of 180° down to 0.1°, each of those bumps and wiggles a further sign of how well we understand the contents, history and evolution of the Universe. Our high-quality data has refined our knowledge of the cosmological parameters that describe the universe, decreasing the error bars by a factor of several on the six parameters that describe the simplest ΛCDM universe. Moreover, and maybe remarkably, the data don’t seem to require any additional parameters beyond those six: for example, despite previous evidence to the contrary, the Universe doesn’t need any additional neutrinos.

The quantity most well-measured by Planck is related to the typical size of spots in the CMB map; it’s about a degree, with an error of less than one part in 1,000. This quantity has changed a bit (by about the width of the error bar) since the previous WMAP results. This, in turn, causes us to revise our estimates of quantities like the expansion rate of the Universe (the Hubble constant), which has gone down, in fact by enough that it’s interestingly different from its best measurements using local (non-CMB) data, from more or less direct observations of galaxies moving away from us. Both methods have disadvantages: for the CMB, it’s a very indirect measurement, requiring imposing a model upon the directly measured spot size (known more technically as the “acoustic scale” since it comes from sound waves in the early Universe). For observations of local galaxies, it requires building up the famous cosmic distance ladder, calibrating our understanding of the distances to further and further objects, few of which we truly understand from first principles. So perhaps this discrepancy is due to messy and difficult astrophysics, or perhaps to interesting cosmological evolution.

This change in the expansion rate is also indirectly responsible for the results that have made the most headlines: it changes our best estimate of the age of the Universe (slower expansion means an older Universe) and of the relative amounts of its constituents (since the expansion rate is related to the geometry of the Universe, which, because of Einstein’s General Relativity, tells us the amount of matter).

But the cosmological parameters measured in this way are just Planck’s headlines: there is plenty more science. We’ve gone beyond the power spectrum above to put limits upon so-called non-Gaussianities which are signatures of the detailed way in which the seeds of large-scale structure in the Universe was initially laid down. We’ve observed clusters of galaxies which give us yet more insight into cosmology (and which seem to show an intriguing tension with some of the cosmological parameters). We’ve measured the deflection of light by gravitational lensing. And in work that I helped lead, we’ve used the CMB maps to put limits on some of the ways in which our simplest models of the Universe could be wrong, possibly having an interesting topology or rotation on the largest scales.

But because we’ve scrutinised our data so carefully, we have found some peculiarities which don’t quite fit the models. From the days of COBE and WMAP, there has been evidence that the largest angular scales in the map, a few degrees and larger, have some “anomalies” — some of the patterns show strange alignments, some show unexpected variation between two different hemispheres of the sky, and there are some areas of the sky that are larger and colder than is expected to occur in our theories. Individually, any of these might be a statistical fluke (and collectively they may still be) but perhaps they are giving us evidence of something exciting going on in the early Universe. Or perhaps, to use a bad analogy, the CMB map is like the Zapruder film: if you scrutinise anything carefully enough, you’ll find things that look a conspiracy, but turn out to have an innocent explanation.

I’ve mentioned eight different Planck papers so far, but in fact we’ve released 28 (and there will be a few more to come over the coming months, and many in the future). There’s an overall introduction to the Planck Mission, and papers on the data processing, observations of relatively nearby galaxies, and plenty more cosmology. The papers have been submitted to the journal A&A, they’re available on the ArXiV, and you can find a list of them at the ESA site.

Even more important for my cosmology colleagues, we’ve released the Planck data, as well, along with the necessary code and other information necessary to understand it: you can get it from the Planck Legacy Archive. I’m sure we’ve only just begun to get exciting and fun science out of the data from Planck. And this is only the beginning of Planck’s data: just the first 15 months of observations, and just the intensity of the CMB: in the coming years we’ll be analysing (and releasing) more than one more year of data, and starting to dig into Planck’s observations of the polarized sky.

Quantum debrief

A week ago, I finished my first time teaching our second-year course in quantum mechanics. After a bit of a taster in the first year, the class concentrates on the famous Schrödinger equation, which describes the properties of a particle under the influence of an external force. The simplest version of the equation is just Schrodinger This relates the so-called wave function, ψ, to what we know about the external forces governing its motion, encoded in the Hamiltonian operator, Ĥ. The wave function gives the probability (technically, the probability amplitude) for getting a particular result for any measurement: its position, its velocity, its energy, etc. (See also this excellent public work by our department’s artist-in-residence.)

Over the course of the term, the class builds up the machinery to predict the properties of the hydrogen atom, which is the canonical real-world system for which we need quantum mechanics to make predictions. This is certainly a sensible endpoint for the 30 lectures.

But it did somehow seem like a very old-fashioned way to teach the course. Even back in the 1980s when I first took a university quantum mechanics class, we learned things in a way more closely related to the way quantum mechanics is used by practicing physicists: the mathematical details of Hilbert spaces, path integrals, and Dirac Notation.

Today, an up-to-date quantum course would likely start from the perspective of quantum information, distilling quantum mechanics down to its simplest constituents: qbits, systems with just two possible states (instead of the infinite possibilities usually described by the wave function). The interactions become less important, superseded by the information carried by those states.

Really, it should be thought of as a full year-long course, and indeed much of the good stuff comes in the second term when the students take “Applications of Quantum Mechanics” in which they study those atoms in greater depth, learn about fermions and bosons and ultimately understand the structure of the periodic table of elements. Later on, they can take courses in the mathematical foundations of quantum mechanics, and, yes, on quantum information, quantum field theory and on the application of quantum physics to much bigger objects in “solid-state physics”.

Despite these structural questions, I was pretty pleased with the course overall: the entire two-hundred-plus students take it at the beginning of their second year, thirty lectures, ten ungraded problem sheets and seven in-class problems called “classworks”. Still to come: a short test right after New Year’s and the final exam in June. Because it was my first time giving these lectures, and because it’s such an integral part of our teaching, I stuck to to the same notes and problems as my recent predecessors (so many, many thanks to my colleagues Paul Dauncey and Danny Segal).

Once the students got over my funny foreign accent, bad board handwriting, and worse jokes, I think I was able to get across both the mathematics, the physical principles and, eventually, the underlying weirdness, of quantum physics. I kept to the standard Copenhagen Interpretation of quantum physics, in which we think of the aforementioned wavefunction as a real, physical thing, which evolves under that Schrödinger equation — except when we decide to make a measurement, at which point it undergoes what we call collapse, randomly and seemingly against causality: this was Einstein’s “spooky action at a distance” which seemed to indicate nature playing dice with our Universe, in contrast to the purely deterministic physics of Newton and Einstein’s own relativity. No one is satisfied with Copenhagen, although a more coherent replacement has yet to be found (I won’t enumerate the possibilities here, except to say that I find the proliferating multiverse of Everett’s Many-Worlds interpretation ontologically extravagant, and Chris FuchsQuantum Bayesianism compelling but incomplete).

I am looking forward to getting this year’s SOLE results to find out for sure, but I think the students learned something, or at least enjoyed trying to, although the applause at the end of each lecture seemed somewhat tinged with British irony.


Among the many other things I haven’t had time to blog about, this term we opened the new Imperial Centre for Inference and Cosmology, the culmination of several years of expansion in the Imperial Astrophysics group. In mid-March we had our in-house grand opening, with a ribbon-cutting by the group’s most famous alumnus.

Statistics and astronomy have a long history together, largely growing from the desire to predict the locations of planets and other heavenly bodies based on inexact measurements. In relatively modern times, that goes back at least to Legendre and Gauss who more or less independently came up with the least-squares method of combining observations, which can be thought of as based on the latter’s eponymous Gaussian distribution.

Our group had already had a much shorter but still significant history in what has come to be called “astrostatistics”, having been involved with large astronomical surveys such as UKIDSS and IPHAS and the many allowed by the infrared satellite telescope Herschel (and its predecessors ISO, IRAS and Spitzer). Along with my own work on the CMB and other applications of statistics to cosmology, the other “founding members” of ICIC include: my colleague Roberto Trotta who has made important forays into the rigorous application of principled Bayesian statistics to problems cosmology and particle physics; Jonathan Pritchard who studies the distribution of matter in the evolving Universe and what that can teach about its constituents and that evolution; and Daniel Mortlock, who has written about some of his work looking for rare and unusual objects elsewhere on this blog. We are lucky to have the initial membership of the group supplemented by Alan Heavens, who will be joining us over the summer and has a long history of working to understand the distribution of matter in the Universe throughout its history. This group will be joined by several members of the Statistics section of the Mathematics Department, in particular David van Dyk, David Hand and Axel Gandy.

One of the fun parts of starting up the new centre has been the opportunity to design our new suite of glass-walled offices. Once we made sure that there would be room for a couple of sofas and a coffee machine for the Astrophysics group to share, we needed something to allow a little privacy. For the main corridor, we settled on this:
IMG 2932
The left side is from the Hubble Ultra-Deep field (in negative), a picture about 3 arc minutes on a side (about the size of a dime or 5p coin held at arm’s length), the deepest — most distant — optical image of the Universe yet taken. The right side is our Milky Way galaxy as reconstructed by the 2MASS survey.

The final wall is a bit different:
IMG 2926
The middle panels show part of papers by each of those founding members of the group, flanked on the left and right side with the posthumously published paper by the Rev. Thomas Bayes who gave his name to the field of Bayesian Probability.

Of course, there has been some controversy about how we should actually refer to the place. Reading out the letters gives the amusing “I see, I see”, and IC2 (“I-C-squared”) has a nice feel and a bit of built-in mathematics, although it does sound a bit like the outcome of a late-90s corporate branding exercise (and the pedants in the group noted that technically it would then be the incorrect I×C×C unless we cluttered it with parentheses).

We’re hoping that the group will keep growing, and we look forward to applying our tools and ideas to more and more astronomical data over the coming years. One of the most important ways to do that, of course, will be through collaboration: if you’re an astronomer with lots of data, or a statistician with lots of ideas, or, like many of us, somewhere in between, please get in touch and come for a visit.

Unfortunately we don’t yet have a webpage for the Centre..

I think I'm a Bayesian. Am I wrong?

Continuing my recent, seemingly interminable, series of too-technical posts on probability theory… To understand this one you’ll need to remember Bayes’ Theorem, and the resulting need for a Bayesian statistician to come up with an appropriate prior distribution to describe her state of knowledge in the absence of the experimental data she is considering, updated to the posterior distribution after considering that data. I should perhaps follow the guide of blogging-hero Paul Krugman and explicitly label posts like this as “wonkish”.

(If instead you’d prefer something a little more tutorial, I can recommend the excellent recent post from my colleague Ted Bunn, discussing hypothesis testing, stopping rules, and cheating at coin flips.)

Deborah Mayo has begun her own series of posts discussing some of the articles in a recent special volume of the excellently-named journal, “Rationality, Markets and Morals” on the topic Statistical Science and Philosophy of Science.

She has started with a discussion Stephen Senn’s “You May Believe You are a Bayesian But You Are Probably Wrong”: she excerpts the article here and then gives her own deconstruction in the sequel.

Senn’s article begins with a survey of the different philosophical schools of statistics: not just frequentist versus Bayesian (for which he also uses the somewhat old-fashioned names of “direct” versus “inverse” probability), but also how the practitioners choose to apply the probabilities that they calculate: either directly in terms of inferences about the world versus using those probabilities to make decisions in order to give a further meaning to the probability.

Having cleaved the statistical world in four, Senn makes a clever rhetorical move. In a wonderfully multilevelled backhanded compliment, he writes

If any one of the four systems had a claim to our attention then I find de Finetti’s subjective Bayes theory extremely beautiful and seductive (even though I must confess to also having some perhaps irrational dislike of it). The only problem with it is that it seems impossible to apply.

He discusses why it is essentially impossible to perform completely coherent ground-up analyses within the Bayesian formalism:

This difficulty is usually described as being the difficulty of assigning subjective probabilities but, in fact, it is not just difficult because it is subjective: it is difficult because it is very hard to be sufficiently imaginative and because life is short.

And, later on:

The … test is that whereas the arrival of new data will, of course, require you to update your prior distribution to being a posterior distribution, no conceivable possible constellation of results can cause you to wish to change your prior distribution. If it does, you had the wrong prior distribution and this prior distribution would therefore have been wrong even for cases that did not leave you wishing to change it. This means, for example, that model checking is not allowed.

I think that these criticisms mis-state the practice of Bayesian statistics, at least by the scientists I know (mostly cosmologists and astronomers). We do not treat statistics as a grand system of inference (or decision) starting from single, primitive state of knowledge which we use to reason all the way through to new theoretical paradigms. The caricature of Bayesianism starts with a wide open space of possible theories, and we add data, narrowing our beliefs to accord with our data, using the resulting posterior as the prior for the next set of data to come across our desk.

Rather, most of us take a vaguely Jaynesian view, after the cranky Edwin Jaynes, as espoused in his forty years of papers and his polemical book Probability Theory: The Logic of Science — all probabilities are conditional upon information (although he would likely have been much more hard-core). Contra Senn’s suggestions, the individual doesn’t need to continually adjust her subjective probabilities until she achieves an overall coherence in her views. She just needs to present (or summarise in a talk or paper) a coherent set of probabilities based on given background information (perhaps even more than one set). As long as she carefully states the background information (and the resulting prior), the posterior is a completely coherent inference from it.

In this view, probability doesn’t tell us how to do science, just analyse data in the presence of known hypotheses. We are under no obligation to pursue a grand plan, listing all possible hypotheses from the outset. Indeed we are free to do ‘exploratory data analysis’ using (even) not-at-all-Bayesian techniques to help suggest new hypotheses. This is a point of view espoused most forcefully by Andrew Gelman (author of another paper in the special volume of RMM).

Of course this does not solve all formal or philosophical problems with the Bayesian paradigm. In particular, as I’ve discussed a few times recently, it doesn’t solve what seems to me the most knotty problem of hypothesis testing in the presence of what one would like to be ‘wide open’ prior information.

The Controversy about Hypothesis Testing

I spent a quick couple of days last week at the The Controversy about Hypothesis Testing meeting in Madrid.

The topic of the meeting was indeed the question of “hypothesis testing”, which I addressed in a post a few months ago: how do you choose between conflicting interpretations of data? The canonical version of this question was the test of Einstein’s theory of relativity in the early 20th Century — did the observations of the advance of the perihelion of Mercury (and eventually of the gravitational lensing of starlight by the sun) match the predictions of Einstein’s theory better than Newton’s? And of course there are cases in which even more than a scientific theory is riding on the outcome: is a given treatment effective? I won’t rehash here my opinions on the subject, except to say that I think there really is a controversy: the purported Bayesian solution runs into problems in realistic cases of hypotheses about which we would like to claim some sort of “ignorance” (always a dangerous word in Bayesian circles), while the orthodox frequentist way of looking at the problem is certainly ad hoc and possibly incoherent, but nonetheless seems to work in many cases.

Sometimes, the technical worries don’t apply, and the Bayesian formalism provides the ideal solution. For example, my colleague Daniel Mortlock has applied the model-comparison formalism to deciding whether objects in his UKIDSS survey data are more likely to be distant quasars or nearby and less interesting objects. (He discussed his method here a few months ago.)

In between thoughts about hypothesis testing, I experienced the cultural differences between the statistics community and us astrophysicists and cosmologists, of which I was the only example at the meeting: a typical statistics talk just presents pages of text and equations with the occasional poorly-labeled graph thrown in. My talks tend to be a bit heavier on the presentation aspects, perhaps inevitably so given the sometimes beautiful pictures that package our data.

On the other hand, it was clear that the statisticians take their Q&A sessions very seriously, prodded in this case by the word “controversy” in the conference’s title. In his opening keynote, Jose Bernardo up from Valencia for the meeting discussed his work as a so-called “Objective Bayesian”, prompting a question from the mathematically-oriented philosopher Deborah Mayo. Mayo is an arch-frequentist (and blogger) who prefers to describe her particular version as “Error Statistics”, concerned (if I understand correctly after our wine-fuelled discussion at the conference dinner) with the use of probability and statistics to criticise the errors we make in our methods, in contrast with the Bayesian view of probability as a description of our possible knowledge of the world. These two points of view are sufficiently far apart that Bernardo countered one of the questions with the almost-rude but definitely entertaining riposte “You are bloody inconsistent — you are not mathematicians.” That was probably the most explicit almost-personal attack of the meeting, but there were similar exchanges. Not mine, though: my talk was a little more didactic than most, as I knew that I had to justify the science as well as the statistics that lurks behind any analysis of data.

So I spent much of my talk discussing the basics of modern cosmology, and applying my preferred Bayesian techniques in at least one big-picture case where the method works: choosing amongst the simple set of models that seem to describe the Universe, at least from those that obey General Relativity and the Cosmological Principle, in which we do not occupy a privileged position and which, given our observations, are therefore homogeneous and isotropic on the largest scales. Given those constraints, all we need to specify (or measure) are the amounts of the various constituents in the universe: the total amount of matter and of dark energy. The sum of these, in turn, determines the overall geometry of the universe. Museo del Jamon In the appropriate units, if the total is one, the universe is flat; if it’s larger, the universe is closed, shaped like a three-dimensional sphere; if smaller, it’s a three-dimensional hyperboloid or saddle. What we find when we make the measurement is that the amount of matter is about 0.282±0.02, and of dark energy about 0.723±0.02. Of course, these add up to just greater than one; model-selection (or hypothesis testing in other forms) allows us to say that the data nonetheless give us reason to prefer the flat Universe despite the small discrepancy.

After the meeting, I had a couple of hours free, so I went across Madrid to the Reina Sofia, to stand amongst the Picassos and Serras. And I was lucky enough to have my hotel room above a different museum:

One chance in 3.5 million

Yes, more on statistics.

In a recent NY Times article, science reporter Dennis Overbye discusses recent talks from Fermilab and CERN scientists which may hint at the discovery of the much-anticipated Higgs Boson. The executive summary is: it hasn’t been found yet.

But in the course of the article, Overbye points out that

To qualify as a discovery, some bump in the data has to have less than one chance in 3.5 million of being an unlucky fluctuation in the background noise.

That particular number is the so-called “five sigma” level from the Gaussian distribution. Normally, I would use this as an opportunity to discuss exactly what probability means in this context — is it a Bayesian “degree of belief” or a frequentist “p-value”, but for this discussion that distinction doesn’t matter: the important point is that one in 3.5 million is a very small chance indeed. [For the aficionados, the number is the probability that x > μ + 5 σ when x is described by a Gaussian distribution of mean μ and variance σ2.]

Why are we physicists so conservative? Are we just being very careful not to get it wrong, especially when making such a potentially important — Nobel-worthy! — discovery? Even for less ground-breaking results, the limit is often taken to be three sigma, which is about one chance in 750. This is a lot less conservative, but still pretty improbable: I’d happily bet a reasonable amount on a sporting event if I really thought I had 749 chances out of 750 of winning. However, there’s a maxim among scientists: half of all three sigma results are wrong. This may be an exaggeration, but certainly nobody believes “one in 750” is a good description of the probability (nor one in 3.5 million for five sigma results). How could this be? Fifty percent — one in two — is several hundred times more likely than 1/750.

There are several explanations, and any or all of them may be true for a particular result. First, people often underestimate their errors. More specifically, scientists often only include errors for which they can construct a distribution function — so-called statistical or random errors. But the systematic errors, which are, roughly speaking, every other way that the experimental results could be wrong, are usually not accounted for, and of course any “unknown systematics” are ignored by definition, and usually not discovered until well after the fact.

The controversy surrounding the purported measurements of the variation of the fine-structure constant that I discussed last week lies almost entirely in the different groups’ ability to incorporate a good model for the systematic errors in their very precise spectral measurements.

And then of course there are the even-less quantifiable biases that alter what results get published and how we interpret them. Chief among these may be publication or reporting bias: scientists and journals are more likely to publish, or even discuss, exciting new results than supposedly boring confirmations of the old paradigm. If there were a few hundred unpublished three-sigma unexciting confirmations for every published groundbreaking result, we would expect many of those to be statistical flukes. Some of these may be related to the so-called “decline effect” that Jonah Lehrer wrote about in the New Yorker recently: new results seem to get less statistically significant over time as more measurements are made. Finally, as my recent interlocutor, Andrew Gelman, points outclassical statistical methods that work reasonably well when studying moderate or large effects… fall apart in the presence of small effects.”

(In fact, Overbye discussed the large number of “false detections” in astronomy and physics in another Times article almost exactly a year ago.)

Unfortunately all of this can make it very difficult to interpret — and trust — statistical statements in the scientific literature, although we in the supposedly hard sciences have it a little easier as we can often at least enumerate the possible problems even if we can’t always come up with a good statistical model to describe our ignorance in detail.

Kind of Bayesian

[Apologies — this is long, technical, and there are too few examples. I am putting it out for commentary more than anything else…]

In some recent articles and blog posts (including one in response to astronomer David Hogg), Columbia University statistician Andrew Gelman has outlined the philosophical position that he and some of his colleagues and co-authors hold. While starting from a resolutely Bayesian perspective on using statistical methods to measure the parameters of a model, he and they depart from the usual story when evaluating models and comparing them to one another. Rather than using the techniques of Bayesian model comparison, they eschew them in preference to a set of techniques they describe as ‘model checking’. Let me apologize in advance if I misconstrue or caricature their views in any way in the following.

In the formalism of model comparison, the statistician or scientist needs to fully specify her model: what are the numbers needed to describe the model, how does the data depend upon them (the likelihood), as well as a reasonable guess for what those numbers night be in the absence of data (the prior). Given these ingredients, one can first combine them to form the posterior distribution to estimate the parameters but then go beyond this to actually determine the probability of the fully-specified model itself.

The first part of the method, estimating the parameters, is usually robust to the choice of a prior distribution for the parameters. In many cases, one can throw the possibilities wide open (an approximation to some sort of ‘complete ignorance’) and get a meaningful measurement of the parameters. In mathematical language, we take the limit of the posterior distribution as we make the prior distribution arbitrarily wide, and this limit often exists.

The problem, noticed by most statisticians and scientists who try to apply these methods is that the next step, comparing models, is almost always sensitive to the details of the choice of prior: as the prior distribution gets wider and wider, the probability for the model gets lower and lower without limit; a model with an infinitely wide prior has zero probability compared to one with a finite width.

In some situations, where we do not wish to model some sort of ignorance, this is fine. But in others, even if we know it is unreasonable to accept an arbitrarily large value for some parameter, we really cannot reasonably choose between, say, an upper limit of 10100 and 1050, which may have vastly different consequences.

The other problem with model comparison is that, as the name says, it involves comparing models: it is impossible to merely reject a model tout court. But there are certainly cases when we would be wise to do so: the data have a noticeable, significant curve, but our model is a straight line. Or, more realistically (but also much more unusually in the history of science): we know about the advance of the perihelion of Mercury, but Einstein hasn’t yet come along to invent General Relativity; or Planck has written down the black body law but quantum mechanics hasn’t yet been formulated.

These observations lead Gelman and company to reject Bayesian model comparison entirely in favor of what they call ‘model checking’. Having made inferences about the parameters of a model, you next create simulated data from the posterior distribution and compare those simulations to the actual data. This latter step is done using some of the techniques of orthodox ‘frequentist’ methods: choosing a statistic, calculating p-values, and worrying about whether your observation is unusual because it lies in the tail of a distribution.

Having suggested these techniques, they go on to advocate a broader philosophical position on the use of probability in science: it is ‘hypothetico-deductive’, rather than ‘inductive’; Popperian rather than Kuhnian. (For another, even more critical, view of Kuhn’s philosophy of science, I recommend filmmaker Errol Morris’ excellent series of blog posts in the New York Times recounting his time as a graduate student in philosophy with Kuhn.)

At this point, I am sympathetic with their position, but worried about the details. A p-value is well-determined, but remains a kind of meaningless number: the probability of finding the value of your statistic as measured or worse. But you didn’t get a worse value, so it’s not clear why this number is meaningful. On the other hand, it is clearly an indication of something: if it is unlikely to have got a worse value then your data must, in some perhaps ill-determined sense, be itself unlikely. Indeed I think it is worries like this that lead them very often to prefer purely graphical methods — the simulations ‘don’t look like’ the data.

The fact is, however, these methods work. They draw attention to data that do not fit the model and, with well-chosen statistics or graphs, lead the scientist to understand what might be wrong with the model. So perhaps we can get away without mathematically meaningful probabilities as long as we are “just” using them to guide our intuition rather than make precise statements about truth or falsehood.

Having suggested these techniques, they go on to make a rather strange leap: deciding amongst any discrete set of parameters falls into the category of model comparison, against their rules. I’m not sure this restriction is necessary: if the posterior distribution for the discrete parameters makes sense, I don’t see why we should reject the inferences made from it.

In these articles they also discuss what it means for a model to be true or false, and what implications that has for the meaning of probability. As they argue, all models are in fact known to be false, certainly in the social sciences that most concerns Gelman, and for the most part in the physical sciences as well, in the sense that they are not completely true in every detail. Newton was wrong, because Einstein was more right, and Einstein is most likely wrong because there is likely to be an even better theory of quantum gravity. Hence, they say, the subjective view of probability is wrong, since no scientist really believes in the truth of the model she is checking. I agree, but I think this is a caricature of the subjective view of probability: it misconstrues the meaning of ‘subjectivity’. If I had to use probabilities only to reflect what I truly believe, I wouldn’t be able to do science, since the only thing that I am sure about my belief system is that it is incoherent:

Do I contradict myself?
Very well then I contradict myself,
(I am large, I contain multitudes.)
Walt Whitman, Song of Myself

Subjective probability, at least the way it is actually used by practicing scientists, is a sort of “as-if” subjectivity — how would an agent reason if her beliefs were reflected in a certain set of probability distributions? This is why when I discuss probability I try to make the pedantic point that all probabilities are conditional, at least on some background prior information or context. So we shouldn’t really ever write a probability that statement “A” is true as P(A), but rather as P(A|I) for some background information, “I”. If I change the background information to “J”, it shouldn’t surprise me that P(A|I)≠P(A|J). The whole point of doing science is to reason from assumptions and data; it is perfectly plausible for an actual scientist to restrict the context to a choice between two alternatives that she knows to be false. This view of probability owes a lot to Ed Jaynes (as also elucidated by Keith van Horn and others) and would probably be held by most working scientists if you made them elucidate their views in a consistent way.

Still, these philosophical points do not take away from Gelman’s more practical ones, which to me seem distinct from those loftier questions and from each other: first, that the formalism of model comparison is often too sensitive to prior information; second, that we should be able to do some sort of alternative-free model checking in order to falsify a model even if we don’t have any well-motivated substitute. Indeed, I suspect that most scientists, even hardcore Bayesians, work this way even if they (we) don’t always admit it.

A couple of weeks ago, a few of my astrophysics colleagues here at Imperial found the most distant quasar yet discovered, the innocuous red spot in the centre of this image:
One of them, Daniel Mortlock, has offered to explain a bit more:

Surely there’s just no way that something which happened 13 billion years ago — and tens of billions of light years away — could ever be reported as “news”? And yet that’s just what happened last week when world-renowned media outlets like the BBC, Time and, er, Irish Weather Online reported the discovery of the highly luminous quasar ULAS J1120+0641 in the early Universe. (Here is a longer list of links to discussions of the quasar in the media: although at least this discovery was generally included under the science heading — the Hawai’i Herald Tribune some how reported it as “local news”, which shows the sort of broad outlook not normally associated with the most insular of the United States.) The incongruity of the timescales involved became particular clear to me when I, as one of the team of astronomers who made this discovery, fielded phonecalls from journalists who, on the one hand, seemed quite at home with the notion of the light we’ve seen from this quasar having made its leisurely way to us for most of the history of the Universe, and then on the other hand were quite relaxed about a 6pm deadline to file a story on something they hadn’t even heard of a few hours earlier. The idea that this story might go from nothing to being in print in less than a day also made a striking contrast with the rather protracted process by which we made this discovery.

The story of the discovery of ULAS J1120+0641 starts with the United Kingdom InfraRed Telescope (UKIRT), and a meeting of British astronomers a decade ago to decide how best to use it. The consensus was to perform the UKIRT Infrared Deep Sky Survey (UKIDSS), the largest ever survey of the night sky at infrared wavelengths (i.e., between 1 micron and 3 microns), in part to provide a companion to the highly successful Sloan Digital Sky Survey (SDSS) that had recently been made at the optical wavelengths visible to the human eye. Of particular interest was the fact that the SDSS had discovered quasars — the bright cores of galaxies in which gas falling onto a super-massive black hole heats up so much it outshines all the stars in the host galaxy — so distant that they are seen as they were when the Universe was just a billion years old. Even though quasars are much rarer than ordinary galaxies, they are so much brighter that detailed measurements can be made of them, and so they are very effective probes of the early Universe. However there was a limit to how far back SDSS could search as no light emitted earlier than 900 million years after the Big Bang reaches us at optical wavelengths due to a combination of absorption by hydrogen atoms present at those early times and the expansion of the Universe stretching the unabsorbed light to infrared wavelengths. This is where UKIRT comes in — whereas distant sources like ULAS J1120+0641 are invisible to optical surveys, they can be detected using infrared surveys like UKIDSS. So, starting in 2005, UKIDSS got underway, with the eventual aim of looking at about 10% of the sky that had already been mapped at shorter wavelengths by SDSS. Given the number of slightly less distant quasars SDSS had found, we expected UKIDSS to include two or three record-breaking quasars; however it would also catalogue tens of millions of other astronomical objects (stars in our Galaxy, along with other galaxies), so actually finding the target quasars was not going to be easy.

Our basic search methodology was to identify any source that was clearly detected by UKIDSS but completely absent in the SDSS catalogues. In an ideal world this would have immediately given us our record-breaking quasars, but instead we still had a list of ten thousand candidates, all of which had the desired properties. Sadly it wasn’t a bumper crop of quasars — rather it was a result of observational noise, and most of these objects were cool stars which are faint enough at optical wavelengths that, in some cases, the imperfect measurement process meant they weren’t detected by SDSS, and hence entered our candidate list. (A comparable number of such stars would also be measured as brighter in SDSS than they actually are; however it is only the objects which are scattered faintward that caused trouble for us.) A second observation on an optical telescope would suffice to reject any of these candidates, but taking ten thousand such measurements is completely impractical. Instead, we used Bayesian statistics to extract as much information from the original SDSS and UKDISS measurements as possible. By making models of the cool star and quasar populations, and knowing the precision of the SDSS and UKIDSS observations we could work out the probability that any candidate was in fact a target quasar. Taking this approach turned out to be far more effective than we’d hoped — almost all the apparently quasar-like candidates had probabilities of less than 0.01 (i.e., they were only a 1% chance to be a quasar) and so could be discarded from consideration without going near a telescope.

For the remaining 200-odd candidates we did make follow-up observations, on UKIRT, the Liverpool Telescope (LT) or the New Technology Telescope (NTT) and in fewer than ten cases were the initial SDSS and UKIDSS measurements verified. By this stage we were almost certain that we had a distant quasar, although in most cases it was sufficiently bright at optical wavelengths that we knew it wasn’t a record-breaker. However ULAS J1120+0641, identified in late 2010, remained defiantly black when looked at by the LT, and so for the first time in five years we really thought we might have struck gold. To be completely sure we used the Gemini North telescope to obtain a spectrum — essentially splitting up the light into different wavelengths, just as happens to sunlight when it passes through water droplets to form a rainbow. The observation was made on Saturday, November 2010 and we got the spectrum e-mailed to us the next day and confirmed that we’d finally got what we were looking for: the most distant quasar ever found.

We obtained more precise spectra covering a wider range of wavelengths using Gemini (again) and the Very Large Telescope, the results of which are shown here:
The black curve shows the spectrum of ULAS J1120+0641; the green curve shows the average spectrum of a number of more nearby quasars (but redshifted to longer wavelengths to account for the cosmological expansion). The two are almost identical, with the obvious exception of the cut-off at 1 micron of ULAS J1120+0641 which comes about due to the absorption by hydrogen in front of the quasar. We had all the data needed to make this plot by the end of January, but it still took another five months for the results to be published in the June 30 edition of Nature — rather longer than the 24-hour turn-around of the journalists who eventually reported on this work. But if we’d given up on the search after four years — or if the Science and Technology Funding Council had withdrawn funding for UKIRT, as seemed likely at one point — then we never would have made this discovery. It was a long time coming but for me — and hopefully for astronomy — it was worth the wait.

Distant objects and nearby music

Normally, I would be writing about the discovery of the most distant quasar by Imperial Astronomers using the UKIDSS survey (using excellent Bayesian methods), but Andy and Peter have beaten me to it. To make up for it, I’ll try to get one of the authors of the paper to discuss it here themselves, soon. (In the meantime, some other links, from STFC, ESO, Gemini, …)

But I’ve got a good excuse: I was out (with one of those authors, as it happens) seeing Paul Simon play at the Hammersmith Apollo:

Paul Simon
Like my parents, Paul Simon grew up in the outer boroughs of New York City, a young teenager at the birth of rock’n’roll, and his music, no matter how many worldwide influences he brings in, always reminds me of home.

He played “The Sound Of Silence” (solo), most of his 70s hits from “Kodachrome” and “Mother and Child Reunion” to the soft rock of “Slip Slidin’ Away”, and covers of “Mystery Train” and “Here Comes the Sun”. But much of the evening was devoted to what is still his masterpiece, Graceland. (We were a little disappointed that the space-oriented backing video for “The Boy in the Bubble” included images neither of the Cosmic Microwave Background nor the new most distant quasar….)