Embarrassing update: as pointed out by Vladimir Nesov in the comments, all of my quantitative points below are incorrect. To maximize expected winnings, you should bet on whichever alternative you judge to be most likely. If you have a so-called logarithmic utility function — which already has the property of growing faster for small amounts than large — you should bet proportional to your odds on each answer. In fact, it’s exactly arguments like these that lead many to conclude that the logarithmic utility function is in some sense “correct”. So, in order to be led to betting more on the low-probability choices, one needs a utiltity that changes even faster for small amounts and slower for large amounts. But I disagree that this is “implausible” — if I think that is the best strategy to use, I should adjust my utility function, not change my strategy to match one that has been externally imposed. Just like probabilities, utility functions encode our preferences. Of course, I should endeavor to be consistent, to always use the same utility function, at least in the same circumstances, taking into account what economists call “externalities”.Anyway, all of this goes to show that I shouldn’t write long, technical posts after the office Christmas party….
The original post follows, mistakes included.
An even more unlikely place to find Bayesian inspiration was Channel 4’s otherwise insipid game show, “The Million Pound Drop”. In the version I saw, B-list celebs start out with a million pounds (sterling), and are asked a series of multiple-choice questions. For each one, they can bet any fraction of their remaining money on any set of answers; any money bet on wrong answers is lost (we’ll ignore the one caveat, that the contestants must wager no money on at least one answer, which means there’s always the chance that they will lose the entire stake).
Is there a best strategy for this game? Obviously, the overall goal is to maximize the actual winnings at the end of the series of questions. In the simplest example, let’s say a question is “What year did England last win the football world cup?” with possible answers “1912”, “1949”, “1966”, and “never”. In this case (assuming you know the answer), the only sensible course is to bet everything on “1966”.
Now, let’s say that the question is “When did the Chicago Bulls last win an NBA title?” with possible answers, “1953”, “1997”, “1998”, “2009”. The contestants, being fans of Michael Jordan, know that it’s either 1997 or 1998, but aren’t sure which — it’s a complete toss-up between the two. Again in this case, the strategy is clear: bet the same amount on each of the two — the expected winning is half of your stake no matter what. (The answer is 1998.)
But now let’s make it a bit more complicated: the question is “Who was the last American to win a gold medal in Olympic Decathlon?” with answers “Bruce Jenner”, “Brian Clay”, “Jim Thorpe”, and “Jess Owens”. Well, I remember that Jenner won in the 70s, and that Thorpe and Owens predate that by decades, so the only possibilities are Jenner and Clay, whom I’ve never heard of. So I’m pretty sure the answer is Jenner, but I’m by no means certain: let’s say that I’m 99:1 in favor of Jenner over Clay.
In order to maximize my expected winnings, I should bet 99 times as much on Jenner as Clay. But there’s a problem here: if it’s Clay, I end up with only one percent of my initial stake, and that one percent — which I have to go on and play more rounds with — is almost too small to be useful. This means that I don’t really want to maximize my expected winnings, but rather something that economists and statisticians call the “utility function”, or conversely, to minimize the loss function, functions which describes how useful some amount of winnings are to me: a thousand dollars is more than a thousand times useful than one dollar, but a million dollars is less than twice as useful as half a million dollars, at least in this context.
So in this case, a small amount of winnings is less useful than one might naively expect, and the utility function should reflect that by growing faster for small amounts and slower for larger amounts — I should perhaps bet ten percent on Clay. If it’s Jenner, I still get 90% of my stake, but if it’s Clay, I end up with a more-useful 10%. (The answer is Clay, by the way.)
This is the branch of statistics and mathematics called decision theory: how we go from probabilities to actions. It comes into play when we don’t want to just report probabilities, but actually act on them: whether to actually prescribe a drug, perform a surgical procedure, or build a sea-wall against a possible flood. In each of these cases, in addition to knowing the efficacy of the action, we need to understand its utility: if a flood is 1% likely over the next century and would cost one million pounds, but would save one billion in property damage and 100 lives if the flood occurred, we need to compare spending a million now versus saving a billion later (taking the “nonlinear” effects above into account) and complicate that with the loss from even more tragic possibilities. One hundred fewer deaths has the same utility as some amount of money saved, but I am glad I’m not on the panel that has to make that assignment. It is important to point out, however, that whatever decision is made, by whatever means, it is equivalent to some particularly set of utilities, so we may as well be explicit about it.
Happily, these sorts of questions tend to arise less in the physical sciences where probabilistic results are allowed, although the same considerations arise at a higher level: when making funding decisions…
I've come across a couple bits of popular/political culture that give me the opportunity to discuss one of my favorite topics: the uses and abuses of probability theory.
The first is piece by Nate Silver of the New York Times' FiveThirtyEight blog, dedicated to trying to crunch the political numbers of polls and other data in as transparent a manner as possible. Usually, Silver relies on a relentlessly frequentist take on probability: he runs lots of simulations letting the inputs vary according to the poll results (correctly taking into account the "margin of error" and more than occasionally using other information to re-weight the results of different polls. Nonetheless, these techniques give a good summary of the results at any given time -- and have been far and away the best discussion of the numerical minutiae of electioneering for both the 2008 and 2010 US elections.
But yesterday, Silver wrote a column: A Bayesian Take on Julian Assange which tackles the question of Assange's guilt in the sexual-assault offense with which he has been charged. Bayes' theorem, you will probably recall if you've been reading this blog, states that the probability of some statement ("Assange is innocent of sexual assault, despite the charges against him") is the product of the probability that he would be charged if he were innocent (the "likelihood") times the probability of his innnocence in the absence of knowledge about the charge (the "prior"):
P(innocent|charged, context) ∝ P(innocent | context) × P(charged|innocent, context)where P(A|B) means the probability of A given B, and the "∝" means that I've left off an overall number that you can mulitply by. The most important thing I've left in here is the "context": all of these probabilities depend upon the entire context in which you consider the problem.
To figure out these probabilities, there are no simulations we can perform -- we can't run a big social-science model of Swedish law-enforcement, possibly in contact with, say, American diplomats, and make small changes and see what happens. We just need to assign probabilities to these statements.
But even to do that requires considerable thought, and important decisions about the context in which we want to make these assignments. For Silver, the important context is that there is evidence that other governments, particularly the US, may have an ulterior motive for wanting to not just prosecute, but persecute Assange. Hence, the probability of his being unjustly accused [P(charged|innocent, context)] is larger than it would be for, say, an arbitrary Australian citizen traveling in Britain. Usually, Bayesian probability is accused of needing a subjective prior, but in this case the context affects and adds a subjective aspect to the likelihood.
Some of the commenters on the site make a different point: given that Assange is, at least in some sense, a known criminal (he has leaked secret documents, which is likely against the law), he is more likely to commit other criminal acts. This time, the likelihood is not affected, but the prior: the commenter believes that Assange is less likely to be innocent irrespective of the information about the charge.
Next: game shows.
[Apologies to those of you who may have seen an inadvertantly-published unfinished version of this post]
I’ve just returned from a week at the Annual meeting of the Institute for Mathematical Statistics in Gothenburg, Sweden. It’s always instructive to go to meetings outside of one’s specialty, outside of the proverbial comfort zone. I’ve been in my own field long enough that I’m used to feeling like one of the popular kids, knowing and being known by most of my fellow cosmologists — it’s a good corrective to an overinflated sense of self-worth to be somewhere where nobody knows your name. Having said that, I was bit disappointed in the turnout for our session, “Statistics, Physics and Astrophysics”. Mathematical statistics is a highly specialized field, but with five or more parallel sessions going on at once, most attendees could find something interesting. However, even cross-cutting sessions of supposedly general interest — our talks were by physicists, not statisticians — didn’t have the opportunity to get a wide audience.
The meeting itself, outside of that session, was very much of mathematical statistics, more about lemmas and proofs than practical data analysis. Of course these theoretical underpinnings are crucial to the eventual practical work, although it’s always disheartening to see the mathematicians idealise a problem all out of recognition. For example, the mathematicians routinely assume that the errors on a measurement are independent and identically distributed (“iid” for short) but in practice this is rarely true in the data that we gather. (I should use this as an opportunity to mention my favourite statistics terms of art: homoscedastic and heteroscedastic, describing, respectively, identical and varying distributions.)
But there were more than a couple of interesting talks and sessions, mostly concentrating upon two of the most exciting — and newsworthy — intersections between statistical problems and the real world: finance and climate. How do we compare complicated, badly-sampled, real-world economic or climate data to complicated models which don’t pretend to capture the full range of phenomena? In what sense are the predictions inherently statistical and in what sense are they deterministic? “Probability”, said de Finetti, the famous Bayesian statistician “does not exist”, by which he meant that probabilities are statements about our knowledge of the world, not statements about the world. The world does, however, give sequences of values (stock prices, temperatures, etc.) which we can test our judgements against. This, in the financial realm, was the discussion of Hans Föllmer’s Medallion Prize Lecture, which veered into the more abstract realm of stochastic integration, Martingales and Itō calculus along the way.
Another pleasure was the session chaired by Robert Adler. Adler is the author of a book called The Geometry of Random Fields, a book which has had a significant effect upon cosmology from the 1980s through today. A “random field” is something that you could measure over some regime of space and time, but for which your theory doesn’t determine its actual value, but only its statistical properties, such as its average and the way the value at different points are related to one another. The best example in cosmology is the CMB itself — none of our theories predict the temperature at any particular place, but the theories that have survived our tests make predictions about the mean value and about the product of temperatures at any two points — this is called the correlation function, and a random field in which only the mean and correlation function can be specified is called a Gaussian random field, after the Gaussian distribution that is the mathematical version of this description. Indeed, Adler uses the CMB as one of the examples on his academic home page. But there are many more application besides: the session featured talks on brain imaging and on Google’s use random fields to analyze data about the way people look at their web pages
Gothenburg itself was nice in that Scandinavian way: nice, but not terribly exciting, full of healthy, attractive people who seem pleased with their lot in life. The week of our meeting overlapped with two other important events in the town. The other big meeting in town was the World Library and Information Congress — you can only imagine the party atmosphere in a town filled with both statisticians and librarians! But adding to that, Gothenburg was hosting its summer kulturkalas festival of culture — the streets were filled with musicians and other performers to distract us from the mathematics.
Blake is of course one of the most famous poets in the English language, but most people know him only from short poems like The Tiger [sic] (“Tyger, Tyger burning bright/ In the forests of the night/ What immortal hand or eye/ Could frame thy fearful symmetry”) and Jerusalem, sung in Anglican churches each week. But most of Blake’s work is much too weird to make it into church. It is peopled by gods and monsters, illuminated by Blake’s own wonderful over-the-top illustrations. (For example, America: A Prophecy, his poetic interpretation of the American Revolutionary War, begins “The shadowy Daughter of Urthona stood before red Orc/When fourteen suns had faintly journey’d o’er his dark abode” — George Washington and Thomas Jefferson don’t make Blake’s version.)
Blake’s gravestone sits right on the pavement in the middle of Bunhill Fields, and as such unfortunately has been slightly damaged.
I don’t read Blake every day or even every week, but I probably do use Bayes’s famous theorem at least that often. As I and other bloggers have gone on and on about, Bayes’s theorem is the mathematical statement of how we ought to rigorously and consistently incorporate new information into our model of the world. Bayes himself wrote down only a version appropriate for a restricted version of this problem, and used words, rather than mathematica symbols. Nowadays, we usually write it mathematically, and in a completely general form, as
Inscription: “Rev. Thomas Bayes, son of the said Joshua and Ann Bayes, 7 April 1761. In recognition of Thomas Bayes’s important work in probability this vault was restored in 1960 with contributions received from statisticians throughout the world.” (With restoration and upkeep since by Bayesian Efficient Strategic Trading of Hoboken, NJ, USA —across the Hudson River from New York City— and ISBA, the International Society for Bayesian Analysis.)
Luckily, not all the astrophysics news this week was so bad.
First, and most important, two of our Imperial College Astrophysics postgraduate students, Stuart Sale and Paniez Paykari, passed their PhD viva exams, and so are on their ways to officially being Doctors of Philosophy. Congratulations to both, especially (if I may say so) to Dr Paykari, who I had the pleasure and fortune to supervise and collaborate with. Both are on their way to continue their careers as postdocs in far-flung lands.
Second, the first major results from the Herschel Space Telescope, Planck’s sister satellite, were released. There are impressive pictures dwarf planets in the outer regions of our solar system, of star-forming regions in the Milky Way galaxy, of the vary massive Virgo Cluster of galaxies, and of the so-called “GOODS” (Great Observatory Origins Deep Survey) field, one of the most well-studied areas of sky. All of these open new windows into these areas of astrophysics, with Herschel’s amazing sensitivity.
Finally, tantalisingly, the Cryogenic Dark Matter Search (CDMS) released the results of its latest (and final) effort to search for the Dark Matter that seems to make up most of the matter in the Universe, but doesn’t seem to be the same stuff as the normal atoms that we’re made of. Under some theories, the dark matter would interact weakly with normal matter, and in such a way that it could possibly be distinguished from all the possible sources of background. These experiments are therefore done deep underground — to shield from cosmic rays which stream through us all the time — and with the cleanest and purest possible materials — to avoid contamination with both both naturally-occurring radioactivity and the man-made kind which has plagued us since the late 1940s.
With all of these precautions, CDMS expected to see a background rate of about 0.8 events during the time they were observing. And they saw (wait for it) two events! This is on the one hand more than a factor of two greater than the expected number, but on the other is only one extra count. To put this in perspective, I’ve made a couple of graphs where I try to approximate their results (for aficionados, these are just simple plots of the Poisson distribution). The first shows the expected number of counts from the background alone:
(I should point out a few caveats in my micro-analysis of their data. First, I don’t take into account the uncertainty in their background rate, which they say is really 0.8±0.1±0.2, where the first uncertainty, ±0.1 is “statistical”, because they only had a limited number of background measurements, and the second, ±0.2, is “systematic”, due to the way they collect and analyse their data. Eventually, one could take this into account via Bayesian marginalization, although ideally we’d need some more information about their experimental setup. Second, I’ve only plotted the likelihood above, but true Bayesians will want to apply a prior probability and plot the posterior distribution. The most sensible choice (the so-called Jeffreys prior) for this case would in fact make the probability peak at zero signal. Finally, one would really like to formally compare the no-signal model with a signal-greater-than-zero model, and the best way to do this would be using the tool of Bayesian model comparison.)
Nonetheless, in their paper they go on to interpret these results in the context of particle physics, which can eventually be used to put limits on the parameters of supersymmetric theories which may be tested further at the LHC accelerator over the next couple of years.
I should bring this back to the aforementioned bad news. The UK has its own dark matter direct detection experiments as well. In particular, Imperial leads the ZEPLIN-III experiment which has, at times, had the world’s best limits on dark matter, and is poised to possibly confirm this possible detection — this will be funded for the next couple of years. Unfortunately, STFC has decided that the next generation of dark matter experiments, EURECA and LUX-ZEPLIN, needed to make convincing statements about these results, weren’t possible to fund.
The perfect stocking-stuffer for that would-be Bayesian cosmologist you’ve been shopping for:
As readers here will know, the Bayesian view of probability is just that probabilities are statements about our knowledge of the world, and thus eminently suited to use in scientific inquiry (indeed, this is really the only consistent way to make probabilistic statements of any sort!). Over the last couple of decades, cosmologists have turned to Bayesian ideas and methods as tools to understand our data. This book is a collection of specially-commissioned articles, intended as both a primer for astrophysicists new to this sort of data analysis and as a resource for advanced topics throughout the field.
Our back-cover blurb:
In recent years cosmologists have advanced from largely qualitative models of the Universe to precision modelling using Bayesian methods, in order to determine the properties of the Universe to high accuracy. This timely book is the only comprehensive introduction to the use of Bayesian methods in cosmological studies, and is an essential reference for graduate students and researchers in cosmology, astrophysics and applied statistics.
The first part of the book focuses on methodology, setting the basic foundations and giving a detailed description of techniques. It covers topics including the estimation of parameters, Bayesian model comparison, and separation of signals. The second part explores a diverse range of applications, from the detection of astronomical sources (including through gravitational waves), to cosmic microwave background analysis and the quantification and classification of galaxy properties. Contributions from 24 highly regarded cosmologists and statisticians make this an authoritative guide to the subject.
I was sitting in the lecture theatre of the Royal Institution at the Science Online London meeting (of which I hope to write more later, but you can retroactively follow the day’s tweets or just search for the day’s tags) when I realized I had missed the fifth anniversary of this blog this past July. So: thanks for your attention for over 400 posts on cosmology, astrophysics, Bayesian probability and probably too much politics and religion.
Today is also a much more important date: the 400th anniversary of Galileo’s telescope. Even Google is celebrating.
Right now, I’m at a meeting in Cambridge discussing Primordial Gravitational Waves — ripples in space and time that have been propagating since the first instants after the big bang. Despite my training as a theoretical physicist, I’m here to discuss the current state of the art of experiments measuring those waves using the polarization of the cosmic microwave background, which probes the effect of those gravitational waves on the constituents of the Universe about 400,000 years after the big bang. Better go finish writing my talk.
OK, this is going to be a very long post. About something I don’t pretend to be expert in. But it is science, at least.
A couple of weeks ago, Radio 4’s highbrow “In Our Time” tackled the so-called “Measurement Problem”. That is: quantum mechanics predicts probabilities, not definite outcomes. And yet we see a definite world. Whenever we look, a particle is in a particular place. A cat is either alive or dead, in Schrodinger’s infamous example. So, lots to explain in just setting up the problem, and even more in the various attempts so far to solve it (none quite satisfactory). This is especially difficult because the measurement problem is, I think, unique in physics: quantum mechanics appears to be completely true and experimentally verified, without contradiction so far. And yet it seems incomplete: the “problem” arises because the equations of quantum mechanics only provide a recipe for the calculations of probabilities, but doesn’t seem to explain what’s going on underneath. For that, we need to add a layer of interpretation on top. Melvyn Bragg had three physicists down to the BBC studios, each with his own idea of what that layer might look like.
Unfortunately, the broadcast seemed to me a bit of a shambles: the first long explanation by Basil Hiley of Birkbeck of quantum mechanics used the terms “wavefunction” and “linear superposition” without even an attempt at a definition. Things got a bit better as Bragg tried to tease things out, but I can’t imagine the non-physicists that were left listening got much out of it. Hiley himself worked with David Bohm on one possible solution to the measurement problem, the so-called “Pilot Wave Theory” (another term which was used a few times without definition) in which quantum mechanics is actually a deterministic theory — the probabilities come about because there is information to which we do not — and in principle cannot — have access to about the locations and trajectories of particles.
Roger Penrose proved to be remarkably positivist in his outlook: he didn’t like the other interpretations on offer simply because they make no predictions beyond standard quantum mechanics and are therefore untestable. (Others see this as a selling point for these interpretations, however — there is no contradiction with experiment!) To the extent I understand his position, Penrose himself prefers the idea that quantum mechanics is actually incomplete, and that when it is finally reconciled with General Relativity (in a Theory of Everything or otherwise), we will find that it actually does make specific, testable predictions.
There was a long discussion by Simon Saunders of that sexiest of interpretations of quantum mechanics, the Many Worlds Interpretation. The latest incarnation of Many-Worlds theory is centered around workers in or near Oxford: Saunders himself, David Wallace and most famously David Deutsch. The Many-Worlds interpretation (also known as the Everett Interpretation after its initial proponent) attempts to solve the problem by saying that there is nothing special about measurement at all — the simple equations of quantum mechanics always obtain. In order for this to occur, then all possible outcomes of any experiment must be actualized: that is, their must be a world for each outcome. But we’re not just talking about outcomes of science experiments here, but rather any time that quantum mechanics could have predicted something other than what (seemingly) actually happened. Which is all the time, to all of the particles in the Universe, everywhere. This is, to say the least, “ontologically extravagant”. Moreover, it has always been plagued by at least one fundamental problem: what, exactly, is the status of probability in the many-worlds view? When more than one quantum-mechanical possibility presents itself, each splits into its own world, with a probability related to the aforementioned wavefunction. But what beyond this does it mean for one branch to have a higher probability? The Oxonian many-worlders have tried to use decision theory to reconcile this with the prescriptions of quantum mechanics: from very minimal requirements of rationality alone, can we derive the probability rule? They claim to have done so, and they further claim that their proof only makes sense in the Many-Worlds picture. This is, roughly, because only in the Everett picture is their no “fact of the matter” at all about what actually happens in a quantum outcome — in all other interpretations the very existence of a single actual outcome is enough to scupper the proof. (I’m not so sure I buy this — surely we are allowed to base rational decisions on only the information at hand, as opposed to all of the information potentially available?)
At bottom, these interpretations of quantum mechanics (aka solutions to the measurement problem) are trying to come to grips with the fact that quantum mechanics seems to be fundamentally about probability, rather than the way things actually are. And, as I’ve discussed elsewhere, time and time again, probability is about our states of knowledge, not the world. But we are justly uncomfortable with 70s-style “Tao-of-Physics” ideas that make silly links between consciousness and the world at large.
But there is an interpretation that takes subjective probability seriously without resorting to the extravagance of many (very, very many) worlds. Chris Fuchs, along with his collaborators Carlton Caves and Ruediger Schack have pursued this idea with some success. Whereas the many-worlds interpretation requires a universe that seems far too full for me, the Bayesian interpretation is somewhat underdetermined: there is a level of being that is, literally unspeakable: there is no information to be had about the quantum realm beyond our experimental results. This is, as Fuchs points out, a very strong restriction on how we can assign probabilities to events in the world. But I admit some dissatisfaction at the explanatory power of the underlying physics at this point (discussed in some technical detail in a review by yet another Oxford philosopher of science, Christopher Timpson).
In both the Bayesian and Many Worlds interpretations (at least in the modern versions of the latter), probability is supposed to be completely subjective, as it should be. But something still seems to be missing: probability assignments are, in fact, testable, using techniques such as Bayesian model selection. What does it mean, in the purely subjective interpretation, to be correct, or at least more correct? Sometimes, this is couched as David Lewis’ “principal principle” (it’s very hard to find a good distillation of this on the web, but here’s a try): there is something out there called “objective chance” and our subjective probabilities are meant to track it (I am not sure this is coherent, and even Lewis himself usually gave the example of a coin toss, in which there is nothing objective at all about the chance involved: if you know the initial conditions of the coin and the way it is flipped and caught, you can predict the outcome with certainty.) But something at least vaguely objective seems to be going on in quantum mechanics: more probable outcomes happen more often, at least for the probability assignments that physicists make given what we know about our experiments. This isn’t quite “objective chance” perhaps, but it’s not clear that there isn’t another layer of physics still to be understood.
In today’s Sunday NY Times Magazine, there’s a long article by psychologist Steven Pinker, on “Personal Genomics”, the growing ability for individuals to get information about their genetic inheritance. He discusses the evolution of psychological traits versus intelligence, and highlights the complicated interaction amongst genes, and between genes and society.
But what caught my eye was this paragraph:
What should I make of the nonsensical news that I… have a “twofold risk of baldness”? … 40 percent of men with the C version of the rs2180439 SNP are bald, compared with 80 percent of men with the T version, and I have the T. But something strange happens when you take a number representing the proportion of people in a sample and apply it to a single individual…. Anyone who knows me can confirm that I’m not 80 percent bald, or even 80 percent likely to be bald; I’m 100 percent likely not to be bald. The most charitable interpretation of the number when applied to me is, “If you knew nothing else about me, your subjective confidence that I am bald, on a scale of 0 to 10, should be 8.” But that is a statement about your mental state, not my physical one. If you learned more clues about me (like seeing photographs of my father and grandfathers), that number would change, while not a hair on my head would be different. [Emphasis mine].
That “charitable interpretation” of the 80% likelihood to be bald is exactly Bayesian statistics (which I’ve talked about, possibly ad nauseum, before) : it’s the translation from some objective data about the world — the frequency of baldness in carriers of this gene — into a subjective statement about the top of Pinker’s head, in the absence of any other information. And that’s the point of probability: given enough of that objective data, scientists will come to agreement. But even in the state of uncertainty that most scientists find themselves, Bayesian probability forces us to enumerate the assumptions (usually called “prior probabilities”) that enter into our assignments reasoning along with the data. Hence, if you knew Pinker, your prior probability is that he’s fully hirsute (perhaps not 100% if you allow for the possibility of hair extensions and toupees); but if you didn’t then you’d probably be willing to take 4:1 odds on a bet about his baldness — and you would lose to someone with more information.
In science, of course, it usually isn’t about wagering, but just about coming to agreement about the state of the world: do the predictions of a theory fit the data, given the inevitable noise in our measurements, and the difficulty of working out the predictions of interesting theoretical ideas? In cosmology, this is particularly difficult: we can’t go out and do the equivalent of surveying a cross section of the population for their genes: we’ve got only one universe, and can only observe a small patch of it. So probabilities become even more subjective and difficult to tie uniquely to the data. Hence the information available to us on the very largest observable scales is scarce, and unlikely to improve much, despite tantalizing hints of data discrepant with our theories, such as the possibly mysterious alignment of patterns in the Cosmic Microwave Background on very large angles of the sky (discussed recently by Peter Coles here). Indeed, much of the data pointing to a possible problem was actually available from the COBE Satellite; results from the more recent and much more sensitive WMAP Satellite have only reinforced the original problems — we hope that the Planck Surveyor — to be launched in April! — will actually be able to shed light on the problem by providing genuinely new information about the polarization of the CMB on large scales to complement the temperature maps from COBE and WMAP.
PAMELA (Payload for Antimatter Matter Exploration and Light-nuclei Astrophysics) is a Russian-Italian satellite measuring the composition of cosmic rays. One of the motivations for the measurements is the indirect detection of dark matter — the very-weakly-interacting particles that make up about 25% of the matter in the Universe (with, as I’m sure you all know by now) normal matter about 5% and the so-called Dark Energy the remaining 70%. By observing the decay products of the dark matter — with more decay occurring in the densest locations — we can probe the properties of the dark particles. So far, these decays haven’t yet been unequivocally observed. Recently, however, members of the PAMELA collaboration have been out giving talks, carefully labelled “preliminary”, showing the kind of excess cosmic ray flux that dark matter might be expected to produce.
But preliminary data is just that, and there’s a (usually) unwritten rule that the audience certainly shouldn’t rely on the numerical details in talks like these. Cirelli & Strumia have written a paper based on those numbers, “Minimal Dark Matter predictions and the PAMELA positron excess” (arXiv:0808.3867), arguing that the data fits their pet dark-matter model, so-called minimal dark matter (MDM). MDM adds just a single type of particle to those we know about, compared to the generally-favored supersymmetric (SUSY) dark matter model which doubles the number of particle types in the Universe (but has other motivations as well). What do the authors base their results on? As they say in a footnote, “the preliminary data points for positron and antiproton fluxes plotted in our figures have been extracted from a photo of the slides taken during the talk, and can thereby slightly differ from the data that the PAMELA collaboration will officially publish” (originally pointed out to me in the physics arXiv blog).
This makes me very uncomfortable. It would be one thing to write a paper saying that recent presentations from the PAMELA team have hinted at an excess — that’s public knowledge. But a photograph of the slides sounds more like amateur spycraft than legitimate scientific data-sharing.
Indeed, it’s to avoid such inadvertent data-sharing (which has happened in the CMB community in the past) that the Planck Satellite team has come up with its rather draconian communication policy (which is itself located in a password-protected site): essentially, the first rule of Planck is you do not talk about Planck. The second rule of Planck is you do not talk about Planck. And you don’t leave paper in the printer, or plots on your screen. Not always easy in our hot-house academic environments.
Update: Bergstrom, Bringmann, & Edsjo, “New Positron Spectral Features from Supersymmetric Dark Matter - a Way to Explain the PAMELA Data?” (arXiv: 0808.3725) also refers to the unpublished data, but presents a blue swathe in a plot rather than individual points. This seems a slightly more legitimate way to discuss unpublished data. Or am I just quibbling?
Update 2: One of the authors of the MDM paper comments below. He makes one very important point, which I didn’t know about: “Before doing anything with those points we asked the spokeperson of the collaboration at the Conference, who agreed and said that there was no problem”. Essentially, I think that absolves them of any “wrongdoing” — if the owners of the data don’t have a problem with it, then we shouldn’t, either (although absent that I think the situation would still be dicey, despite the arguments below and elsewhere). And so now we should get onto the really interesting question: is this evidence for dark matter, and, if so, for this particular model. (An opportunity for Bayesian model comparison!?)
Paul Davies had an op-ed in the New York Times, comparing scientists’ reliance “on the the assumption that nature is ordered in a rational and intelligible way” with religious faith. He wants to eschew both in favor of regarding “the laws of physics and the universe they govern as part and parcel of a unitary system, and to be incorporated together within a common explanatory scheme.”
Sean Carroll responds and points to other bloggers, along with the always-pretentious Edge crowd. My opinion on Davies’ not-quite-cogent ideas hasn’t changed since my review of his book. (And a little Bayesian probability helps makes the case that we’re not engaged in mathematical proof, but in testing the repercussions of our theories.)
This post is a work in progress, but I’ve decided to post it in its unfinished state. Comments and questions welcome!
The MiniBooNE results have been discussed in depth elsewhere. Like MINOS last year, MiniBooNE was looking at the masses of neutrinos. Specifically, it was looking for the oscillation between electron neutrinos and mu neutrinos. A decade ago, the LSND experiment saw events indicating that mu antineutrinos could oscillate into electron antineutrinos, which gave evidence of a mass difference between the two “flavors”. Unfortunately, this evidence was at odds with the results of two other neutrino oscillation experiments, at least in the standard model with three (but only three) different flavors. MiniBooNE set out to test these results. The results so far seem to contradict those from LSND (at about “98% confidence”).
But here I want to talk about a specific aspect of the statistical methods that MiniBooNE (and many other modern particle physics experiments). How did they come up with that 98% number? Over the last couple of decades, the particle physics has arrived at what it considers a pretty rigorous set of methods. It relies on two chief tenets. First, make sure you can simulate every single aspect of your experiment, varying all of the things that you don’t know for sure (aspects of nuclear physics, the properties of the detectors, and, of course, the unknown physics). Compare these simulations to your data in order to “tune” those numbers to their (assumedly) actual values. Finally, delay looking at the part of your data that contains the actual signal until the very end of the process. Putting these all together means that you can do a “blind analysis” only “opening the box” at that final stage.
Why do they go through all of this trouble? Basically, to avoid what is technically known as “bias” — the scary truth that we scientists can’t be trusted to be completely rational. If you look at your data while you’re trying to understand everything about your experiment, you’re likely to stop adjusting all the parameters when you get an answer that seems right, that matches your underlying prejudices. (Something like this is a well known problem in the medical world: “publication bias” in which only successful studies for a given treatment ever see the light of day.)
Even with the rigorous controls of a blind analysis, the MiniBooNE experimenters have still had to intervene in the process more than they would have liked: they adjusted the lower-limit of the particle energies that they analyzed in order to remove an anomalous discrepancy with expectations in their simulations. To be fair, the analysis was still blind, but it had the effect of removing an excess of events at the now-discarded low energies. This excess doesn’t look anything like the signal for which they were searching — and it occurs in a regime where you might have less confidence in the experimental results, but it does need to be understood. (Indeed, unlike the official talks which attempt to play down this anomaly, the front-page NY Times article on the experiment highlights it.)
Particle physicists can do this because they are in the lucky (and expensive) position of building absolutely everything about their experimental setup: the accelerators that create their particles, the targets that they aim them at, and the detectors that track the detritus of the collisions, all lovingly and carefully crafted out of the finest of materials. We astrophysicists don’t have the same luxury: we may build the telescope, but everything else is out there in the heavens, out of our control. Moreover, particle experiments enjoy a surfeit of data — billions of events that don’t give information about the desired physical signal, but do let us calibrate the experiment itself. In astrophysics, we often have to use the very same data to calibrate as we use to measure the physics we’re interested in. (Cosmic Microwave Background experiments are a great example of this: it’s very difficult to get lots of good data on our detectors’ performance except in situ.)
It also happens that the dominant statistical ideology in the particle physics community is “frequentist”, in contrast to the Bayesian methods that I never shut up about. Part of the reason for the difference is purely practical: frequentist methods make sense when you can perform the needed “Monte Carlo simulations” of your entire experiment, varying all of the unknowns, and tune your methods against the experimental results. In astrophysics, and especially in cosmology, this is more difficult: there is only one Universe (at least only one that we can measure). But there would be nothing to stop us from doing a blind analysis, simultaneously measuring — and, in the parlance of the trade, marginalizing over — the parameters that describe our experiment that are “tuned” in the Monte Carlo analysis. Indeed, the particle physics community, were it to see the Bayesian light and truth, could in principle do this, too. The problem is simply that this would be a much more computationally difficult task.
I’m just back from a couple of days up in Edinburgh, one of my favorite cities in the UK. London is bigger, more intense, but Edinburgh is more beautiful, dominated by its landscape—London is New York to Edinburgh’s San Francisco.
I was up there to give the Edinburgh University Physics “General Interest Seminar”. Mostly, I talked about the physical theory behind and observations of the Cosmic Microwave Background, but I was also encouraged to make some philosophical excursions. Needless to say, I talked about Bayesian Probability, and this in turn gave me an excuse to talk about David Hume, my favorite philosopher, and son of Edinburgh. Hume was the first to pose the “problem of induction”: how can we justify our prediction of the future based on past events? How can we logically justify our idea that there is a set of principles that govern the workings of the Universe? The canonical version of this asks: how can we be sure that the sun will rise tomorrow? Yes, it’s done so every day up until now, but tomorrow’s sunrise doesn’t logically follow from that. One possible argument is that induction has always worked up until now, so we can expect it to work again in this case. But this seems to be a vicious circle (rather than a virtuous spiral). As I discussed a few weeks ago, I think this whole problem just grows out of a category error: one cannot make logical proofs of physical theories.
I also went down the dangerous road of discussing anthropic arguments in cosmology, to some extent rehashing the discussion in my review of Paul Davies’ “Goldilocks Enigma”.
But in between I talked about the current state of CMB data, our own efforts to constrain the topology of the Universe, and the satellites, balloons and telescopes that we hope will improve our knowledge even further over the coming few years.
Next up, a more general talk on the topology of the Universe at next week’s Outstanding questions for the standard cosmological model meeting, and then a more general review of the CMB at the Institute of Physics Nuclear and Particle Physics Divisional Conference.
With yesterday’s article on “Faith” (vs Science) in the Guardian, and today’s London debate between bioligist Lewis Wolpert and the pseudorational William Lane Craig (previewed on the BBC’s Today show this morning), the UK seems to be the hotbed of tension between science and religion. I’ll leave it to the experts for a fuller exposition, but I was particularly intrigued (read: disgusted) by Craig’s claims that so little of science has been “proved”, and hence it was OK to believe in other unproven things like, say, a Christian God (alhough I prefer the Flying Spaghetti Monster).
The question is: what constitutes “proof”?
Craig claimed that such seemingly self-evident facts such as the existence of the past, or even the existence of other minds, were essentially unproven and unprovable. Here, Craig is referring to proofs of logic and mathematics, those truths which follow necessarily from the very structure of geometry and math. The problem with this standard of proof is that it applies to not a single interesting statement about the external world. All you can see with this sort of proof are statements like 1+1=2, or that Fermat’s Theorem and the Poincaré Conjecture are true, or that the sum of the angles of a triangle on a plane are 180 degrees. But you can’t prove in this way that Newton’s laws hold, or that we descended from the ancestor’s of today’s apes.
For these latter sorts of statements, we have to resort to scientific proof, which is a different but still rigorous standard. Scientific proofs are unavoidably contingent, based upon the data we have and the setting in which we interpret that data. What we can do over time is get better and better data, and minimize the restrictions of that theoretical setting. Hence, we can reduce Darwinian evolution to a simple algorithm: if there is a mechanism for passing along inherited characteristics, and if there are random mutations in those characteristics, and if there is some competition among offspring, then evolution will occur. Furthermore, if evolution does occur, then the archaeological record makes it exceedingly likely that present-day species have evolved in the accepted.
Similarly, given our observations of the movement of bodies on relatively small scales, it is exceedingly likely that a theory like Einstein’s General Relativity holds to describe gravity. Given observations on large scales, it is exceedingly likely that the Universe started out in a hot and dense state about 14 billion years ago, and has been expanding ever since.
The crucial words in the last couple of paragraphs are “exceedingly likely” — scientific proofs aren’t about absolute truth, but probability. Moreover, they are about what is known as “conditional probability” — how likely something is to be true given other pieces of knowledge. As we accumulate more and more knowledge, plausible scientific theories become more and more probable. (Regular readers will note that almost everything eventually comes back to Bayesian Statistics.)
Hence, we can be pretty sure that the Big Bang happened, that Evolution is responsible for the species present on the earth today, and that, indeed, other minds exist and that the cosmos wasn’t created in media res sometime yesterday.
This pretty high standard of proof must be contrasted with religious statements about the world which, if anything, get less likely as more and more contradictory data comes in. Of course, since the probabilities are conditional, believers are allowed to make everything contingent not upon observed data, but on their favorite religious story: the probability of evolution given the truth of the New Testament may be pretty small, but that’s a lot to, uh, take on faith, especially given all of its internal contradictions. (The smarter and/or more creative theologians just keep making the religious texts more and more metaphorical but I assume they want to draw the line somewhere before they just become wonderfully-written books).
The work that I’ve been doing with my student is featured on the cover of this week’s New Scientist. Unfortunately, a subscription is necessary to read the full article online, but if you do manage to find it on the web or the newsstand, you’ll find a much better explanation of the physics than I can manage here, as well as my koan-like utterances such as “if you look over here, you’re also looking over there”. There are more illuminating quotes from my friends and colleagues Glenn Starkman, Janna Levin and Dick Bond (all of whom I worked with at CITA in the 90s, coincidentally).
We’re exploring the overall topology of space, separate from its geometry. Geometry is described by the local curvature of space: what happens to straight lines like rays of light — do parallel rays intersect, do triangles have 180 degrees? But topology describes the way different parts of that geometry are connected to one another. Could I keep going in one direction and end up back where I started — even if space is flat, or much sooner than I would have thought by calculating the circumference of a sphere? The only way this can happen is if space has four-or-more-dimensional “handles” or “holes” (like a coffee mug or a donut). We can only picture this sort of topology by actually curving those surfaces, but mathematically we can describe topology and geometry completely independently, and there’s no reason to assume that the Universe shouldn’t allow both of them to be complicated and interesting. My student, Anastasia Niarchou, and I have made predictions about the patterns that might show up in the Cosmic Microwave Background in these weird “multi-connected” universes. This figure shows the kinds of patterns that you might see in the sky:
The first four are examples of these multi-connected universes, the final one is the standard, simply-connected case. We’ve then carefully compared these predictions with data from the WMAP satellite, using the Bayesian methodology that I never shut up about. Unfortunately, we have determined that the Universe doesn’t have one of a small set of particularly interesting topologies — but there are still plenty more to explore.
Update: From the comment below, it seems I wasn’t clear about what I meant by asking if I could “keep going in one direction and end up back where I started”. In a so-called “closed” universe (with k=-1, as noted in the comment) shaped like a sphere sitting in four dimensions, one can indeed go straight on and end up back where you started. This sort of Universe is, however, still simply-connected, and wasn’t what I was talking about. Even in a Universe that is locally curved like a sphere, it’s possible to have multiply-connected topology, so that you end up back again much sooner, or from a different direction, than you would have thought (from measuring the apparent circumference of the sphere). You can picture this in a three-dimensional cartoon by picturing a globe and trying to “tile” it with identical curved pieces. Except for making them all long and then (like peeling an orange along lines of longitude), this is actually a hard problem, and indeed it can only be done in a small number of ways. Each of those ways corresponds to the whole universe: when you leave one edge of the tile, you re-enter another one. In our three-dimensional space, this corresponds to leaving one face of a polyhedron and re-entering somewhere else. Very hard to picture, even for those of us who play with it every day. I fear this discussion may have confused the issue even further. If so, go read the article in New Scientist!