George Rebane
I’m working on a piece that describes in a more formal and specific sense how people can have their own opinions AND facts, given that they all formulate or update them from the same body of newly arrived evidence. How can that be? The answer to that question also goes a long way toward an understanding of how the gulf formed between the polarized factions in our land, and sadly, the difficulty in attempting to bring the two (or more) sides together to a new common understanding, or at least to a workable common ground.
The answer lies in the manner we humans approximate Bayesian inference and reasoning. Along the way in my studies I stumbled upon a form (there are many) of the Bayes formula that I had to rederive years ago, and surprisingly, since then I have seen used in very few instances. This formula from Bayes theory is so deliciously intuitive and straightforward to understand, that I thought a quick intro to it in the context of updating everyday beliefs and knowledge would be helpful to the layman, or at least provide some amusement before it disappears into the memory hole.
Before diving in, let’s all appreciate that every piece of knowledge or tenet of our belief system is represented by a probability distribution, whether or not we realize it. These distributions quantify everything from ‘certain knowledge’ (probability = 1), to items that are more iffy (probability between 0 and 1), and all the way to the ‘impossible’ (probability = 0). Most people believe with certainty that the sun will rise tomorrow morning, however, most people will know the time of the sunrise only to within a range of values to which they attach an unquantified subjective probability. The events ‘sun will rise’ vs ‘sun will not rise’ are mutually exclusive with the first having probability of occurrence equaling one (unity, the certain event), and the second having probability zero (the impossible event) – well, there is a very small probability that the sun will not rise tomorrow, but let’s not quibble here. So these are discrete events described by a simple discrete probability distribution consisting of a one and a zero characterizing our knowledge of these mutually exclusive events.
Let’s go back to a more interesting example involving a discrete probability density function (pdf). Discrete pdfs describe the outcome probabilities of discrete events or propositions. Suppose you meet someone whom you’d like to invite to join a politically oriented service organization of which you are member. But you don’t want to invite him until you are pretty sure of his political leaning. The four possible political categories that you consider his orientation to be are conservative (C), middle road (M), liberal (L), and undecided (U). From your own experience and from what you’ve heard, the discrete pdf that describes your current knowledge of this person has the values P(C) = 0.4, P(M) = 0.2, P(L) = 0.2, P(U) = 0.2 which sum to one because you consider it certain that he is one of the four. This means that you think he's a conservative with 4 out of 10 chances, or 4/10 = 0.4, and so on.
Now at a subsequent event with him you overheard him say, “… and Trump’s mouth didn’t help his case either.” That comment snippet is all the new evidence E that you have to update your assessment of his political leaning. What you need to do now is to decide how much more likely such a statement would be uttered by a C than be uttered by a not-C, that is a M, L, or U. Well, a not-C would more likely say something like that than a C, so the likelihood of his being a not-C is higher than that of a C. But how much higher? You decide that the leading “…and” you heard followed a criticism of Trump’s enacted policies to make the stronger case that he was perhaps a L or at least an U. So you conclude, that all else being equal, that utterance would come out of only one out of 50 people calling themselves a C. This yields something called a Likelihood Ratio of obtaining E, given it came from a C, (written as L(E|C)) as one over fifty or 1/50 = 0.02.
You go through the same process with the other possibilities of attributing the likelihoods to the remaining three categories to quantify L(E|M), L(E|L), and L(E|U). So let’s say after noodling on it a bit, you came up with L(E|M) = 1, L(E|L) = 30, and L(E|U) = 5. That you got L(E|M) = 1 denotes the important conclusion that a M would be equally likely to utter E as not to utter E. This means that having heard E did not give you any additional information about his being classified an U. So what do we do with all these numbers?
Well, we keep in mind that what we want to know is how E has updated our knowledge about the political leanings of this individual. In short, we want the updated discrete probability density values given E. These are expressed as P(C|E), P(M|E), P(L|E), and P(U|E), which again must sum to one because it’s certain that your candidate belongs to one of these persuasions. It turns out that there is a simple form of the Bayes formula that let’s us quickly compute these updated or posterior (to getting E) distributions. And for P(C|E) it is
P(C|E) = L(E|C)*P(C)/(L(E|C)*P(C) + L(E|M)*P(M) + L(E|L)*P(L) + L(E|U)*P(U))
Easy money, it’s simply the product of the categorical (here C) likelihood ratio times its prior (before evidence E) probability, divided by the sum of such products for each possible category C, M, L, U. You can now infer that an identical formula works for the other three categories. When we plug in the numbers, we get for C
P(C|E) = 0.02*0.4/(0.02*0.4 + 1*0.2 + 30*0.2 + 5*0.2) = 0.008/7.208 = 0.001
So, given evidence E and the other possible categories, your subjective assessment, quantified through the Bayes formula, is now that there’s about one in a thousand chances that your candidate is a conservative. When you crank out the other posterior probabilities you get the complete new probability distribution over the categories, and note that these probabilities also sum to one.
P(C|E) = 0.001, P(M|E) = 0.028, P(L|E) = 0.832, and P(U|E) = 0.139.
With this updated knowledge you conclude that it’s highly likely that the candidate is a liberal. Note that initially you had no idea about the leanings of the candidate other than someone told you he most likely was a conservative, which caused you to assign P(C) = 0.4 and divide the remaining probability 0.6 equally among the three other possible categories. (This highlights an important tenet of probability theory, namely that ignorance over a range of possibilities is represented by a flat probability density – e.g. it’s 50-50 how a flipped fair coin lands.)
The figure below compares the original (prior) probability density function, and updated (posterior) pdf after incorporating evidence E.
I’ll finish this powerful little lesson by mentioning the happy conclusion that when a new piece of evidence comes in about your candidate, then revisit that evidence exactly as described above with one exception. Now you should use the above posterior probabilities as your starting prior probabilities for the next iteration. In this way you will continue incorporating new evidence until it’s time for you to decide on inviting the candidate to join or not.
The process of incorporating additional pieces of evidence as they arrive may make the individual categorical probabilities to increase or decrease, just as your qualitative subjective assessment would vary with newly arrived evidences. This is one of the powerful aspects of Bayesian analysis that captures and mirrors realworld inferencing; it is called non-monotonic reasoning. (This is also illustrated in the Bayes surface figures where, depending on the likelihood ratio, the posterior probability may be less than or greater than the prior probability.)
The reader who has grasped the example of how knowledge is updated by incorporating new evidence with prior (old) knowledge is now ready learn the quantitative basis for how two or more people can cobble together their very own facts from the receipt of identical reports of new evidence.
The interested reader can contact me through the comment stream to request a derivation of the above Bayes formula, and a reusable spreadsheet which also contains a quantitative confirmation proving the correctness of the formula. The above pictured surfaces illustrate the Bayes formula for updating a single (one-dimensional) hypothesis H with evidence E with its attendant likelihood ratio L(E|H).
You left us speechless this time.
Posted by: Bob Hobert | 25 February 2021 at 07:59 PM