George Rebane
[This post continues the numeracy series on the Bayes contribution to our civilization. For more background and examples, please see ‘Making Medical Decisions’.]
What you see here is a 3D graphical illustration of the celebrated Bayes Theorem (or formula). I have not seen this beautiful surface presented in the literature, so I wrote a little program to do just that. In this representation the Bayes formula (seen above the surface) is expressed in its likelihood form. We are reminded that the Bayes Theorem is the correct and most powerful process by which we take existing uncertain knowledge, combine it with new data or observations, and update that knowledge.
Today the Bayes Theorem underlies the operational principles embedded in so much of what makes our modern world. It is safe to say that most new technologies in fields that include medicine, energy exploration and production, finance, communication, space exploration, transportation (land, sea, air), defense, the internet, manufacturing, distribution of goods and power, … would be handicapped beyond belief were we to remove their Bayesian components – we would literally be thrown back into the technological age of the 1950s, and our modern world would cease to exist.
To the above list we add artificial or machine intelligence (AI for short), all forms of which are now based on Bayesian inference and its newest offspring, the new calculus of causality. These applications focus on learning from the real world, a world which presents itself ambiguously within confounding frames of uncertainty. As critters, our brainbones have evolved to make Bayesian kinds of decisions within a utility that ranks survival highest. Many of our neural structures appear to have parts of the illustrated surface hardwired into what Daniel Kahneman (Thinking Fast and Slow, 2012) has identified as our intuitive ‘System 1’ cognitive processor that lets us come to rapid conclusions.
The height of the illustrated Bayes surface indicates the values of P(H|E), the updated or posterior probability of some hypothesis H being true given that evidence E is obtained (i.e. is true). The color bar also gives the numerical values of P(H|E) as computed by the shown Bayes formula in terms of P(H) and L(E|H) shown in the figure and explained below.
To understand the basic use of the Bayes surface we need two numbers, one characterizing our prior belief in some hypothesis or proposition being true – e.g. the White Sox will win the 2012 World Series – in the form of a probability, say, P(H) or its equivalent odds (see below). The other number characterizes the quality of the new information brought to bear on the hypothesis – e.g. acquisition of a dynamite pitcher with great performance stats – that is expressed as a likelihood ratio L(E|H) for the quality of evidence at hand (here, the new pitcher’s stats in the context of baseball history).
As indicated in the graphic, the L(E|H) is just the ratio of two probabilities – P(E|H) that indicates the frequency that the evidence (here, a pitcher with such good stats) showed itself when the hypothesis was TRUE (here, that such pitchers’ teams went on to win the Series); and P(E|-H) indicating the frequency with which the evidence also surfaced when the hypothesis was FALSE (the team didn't win the Series). Dividing the former by the latter yields L(E|H) from which it is clear that when L exceeds one, then the evidence has more frequently been encountered when the teams won the Series, and vice versa when L is less than one. It should also be evident from intuition that when L =1, then the evidence should not change P(H|E), our posterior belief about the chances of the team going on to win the Series; in short P(H|E) = P(H), and that can be seen from the Bayes formula that is also visible as the only straight yellow line on the Bayes surface that traces out the ‘cut’ along which L = 1. (The other yellow line inscribed on the surface indicates the P(H) = 0.5 cut, and shows how rapidly P(H|E) grows as it is updated with better and better evidence, i.e. as L gets bigger.)
Putting in some actual numbers here may help. Suppose the search of 130 years of baseball data revealed that of all the 130 World Series winning teams, only 27 of them had hired such sterling pitchers early in the season. And in all the remaining 2,872 losing team/seasons, 259 times was such a pitcher hired. This means that the historical evidence says that P(E|H) = 27/130 = 0.21 or 21%, and that P(E|-H) = 259/2872 = 0.09 or 9%. Therefore L(E|H) = 0.21/0.09 = 2.33 is the likelihood ratio and an indication of the quality of our evidence about good pitchers helping win the world series.
The last piece of information we need is P(H), the prior probability of the White Sox winning the 2012 World Series. Suppose for starts that we know nothing more about the White Sox odds than that they are one of 32 major league teams, and all of them have an equal chance of winning the Series. Our prior knowledge would then put P(H) = 1/32 = 0.03 = 3%. Using this with the likelihood L = 2.33, we calculate from the Bayes formula in the figure that P(H|E) = (2.33*0.03)/(2.33*0.03 + 1 – 0.03) = 0.07 or 7%. From the evidence and what we know at the start, this tells us that our chances (probability) of winning the Series would be more than doubled were we to acquire such a pitcher early in the season.
I used the often quoted term ‘odds’ in the above to indicate another measure of uncertainty that people may find more familiar. The relationship between the odds of something happening and its probability of happening is very straightforward; odds O = (probability that something will happen)/(probability that the same thing will not happen). In the above example we’re looking for O(H), the prior odds of winning the Series, which is simply P(H)/[1 – P(H)] = 0.03/0.97 which is about 1:32 against the White Sox. To compute the posterior (to hiring the pitcher) odds O(H|E) we get from above 0.07/0.93 or a little less than 1:13 against the White Sox, a definite improvement.
The Bayes formula expressed in terms of odds has a very simple form, namely O(H|E) = L(E|H)*O(H). In other words just multiply the prior odds by the likelihood of the evidence to get the new and improved odds. Confirm this yourself with the above example.
[19jan13 update] Reader Jon pointed out that the "interesting part" of the Bayes surface lies around likelihood ratios near one and less than one. And he's right of course. Often the evidence we must deal with is not robust enough (having high likelihood ratio values) in supporting the hypothesis, the probability of which we are updating. Many times we must incorporate evidence that weighs against our prior belief in the hypothesis, and that evidence makes itself known through likelihood ratios less than one, sometime considerably less than one.
The above figure compresses that region of likelihood ratios and doesn't clearly show the interesting ogive shape the surface takes at low value of L(E|H). I have regenerated that region using the standard 'trick' of substituting the logarithm (base 10 here) of the ratio to affect a stretch of the surface for L(E|H) < 1. Not to worry, the shown log(L) values can be converted back to the recognizable L values by simply calculating L = 10^(log(L)), the base 10 raised to the log(L) power. So the familiar L = 1 line from the above figure takes off from the log(L) = log(1) = 0 value. And similarly log(L) = -0.8 designates the surface at L = 10^(-0.8) = 0.158, and so on. All this is shown in the figure below which you can again enlarge by clicking on it. (H/T to Jon for suggesting this.)
Seems to me that you could calculate the odds that if you converted every school in the county to charter schools, that the test scores would rise. Of course you'd have fun calculating in the quality of teacher you'd attract with the lower salaries offered, in order to achieve smaller class sizes that supposedly are part of what gives a charter school it's apparent edge. And now that all parents and socio-economic backgrounds would be represented, we'd have a more realistic expectation of overall parent involvement, another factor to be considered.
End result, pardon my intuition, about the same scores as an average, except the poor soci-economic background schools would score even lower, and the upper soci-economic charters would score higher than their public school predecessors. In short, a nice way for the rich and their offspring to get even richer. How nice! Nothing like picking on the weaker members of society to get ahead, survival of the fittest, after all...
Posted by: Douglas Keachie | 03 May 2012 at 11:14 AM
Great stuff!
It would be nice if you'd also show the log likelihood surface, so as to use more of the image on the "interesting" part of the surface when the likelihood ratio is closer to 1. Putting your source up would be nice too (ideally, on github!).
Posted by: Jon | 18 January 2013 at 04:29 PM
Jon 429pm - thank you Jon, your good suggestion has been incorporated. But I am not sure as to what "source" you are referring.
Posted by: George Rebane | 19 January 2013 at 02:56 PM
Jon - I think that I figured out your use of "source" as in source code. I coded the graphic in Matlab, and the program is rather trivial because it just implements the displayed likelihood ratio version of Bayes rule. I'm surprised that no one has generated that surface before (at least that I've run across, and I've been messing with Bayes inference for an awfully long time). gjr
Posted by: George Rebane | 19 January 2013 at 04:33 PM