George Rebane
In the months leading up to this election year these pages have recorded a lot of predictions, some more vehement than others. Predicting, estimating, forecasting, …, all of them are hard and involve some level of risk depending how well these efforts are carried out. We spend almost every waking moment taking some kind of peek into the future, and depending on how certain we are of what we see there, we take an action which may involve some kind of hedging if we’re not too confident about what we see.
I’ve always been interested in how well we can prognosticate. Recently nobelist Daniel Kahneman of Kahneman/Tversky fame wrote what instantly became a very famous essay - Thinking Fast and Slow (2011) - on the findings of research into things such as prediction, estimation, risk aversion, reasoning and so on. Bottom line – research to date has shown that humans are mostly not very good predictors, and they don’t always do the reasonable things. Yet we do have a brain bone that has allowed us to survive over the millennia and evolve our thinking capacity to some pretty commendable levels. After all, we did discover relativity, sequence the human genome, put a man on the Moon, and are about to devise an AI that may make us second class citizens on our own planet.
So how well can we predict? Here I propose a fun experiment on the topic that invites RR readers to go on record and compete with each other predicting anything they want and, hopefully, get someone else interested enough to also render their prediction. I offer an easy, intuitive, and enjoyable way to do this using the MAB distribution which asks you to specify four numbers that characterize your prediction. The method is spelled out in a previous post ‘Predicting with Expressed Beliefs – a formal approach’.
Since this is election season, say you want to predict what percent of the Democrat vote Bernie Sanders will get in this Tuesday’s New Hampshire primary. Today that percentage is a random variable, but next Wednesday it will be known and no longer random. All such future values are random variables, and the best we can do to express them is to characterize our belief in the value of the variable is to describe what is called its probability distribution. And as the above referenced post details, this can be done by simply writing down your subjective belief in terms of the Low (L) and High (H) values which bound the range of Bernie’s percentage, the most likely percentage (M) Bernie will get, and your confidence (C) - zero to one - that Bernie’s actual percentage will be in the neighborhood of your best guess or most likely value. In short, your prediction will be a 4-tuple that might look something like – [L, H, M, C] = [51%, 60%, 54%, 0.7].
So I invite everyone to start off by submitting their predictions in this post’s comment stream on all or some of the candidates competing in next Tuesday’s primary. I’ll put them into a spreadsheet I’ve generated that will compare and calculate the results which I will publish in an update to this post. If you want to do your own comparisons, I’ll gladly email you the spreadsheet into which you can enter the competing 4-tuples. From there on, as the weeks and months pass, everyone can offer predictions on anything – future primary results, polls, when the FBI will submit its ‘Comey vote’, and so on.
Finally, I realize that I’m taking a risk that a sufficient number or any RR readers will give a warm bucket of spit about actually putting their predictions on record and having them compared to those of other readers. It may not happen, but those who are interested in how their ‘prognosticator’ is working, here is an easy and correct way of doing it that you may want to use in other undertakings that involve future uncertainty. In future posts, depending on interest, I’ll publish how to use MABs for budgeting, calculating investment returns (and risk), and estimating costs and/or revenues.
[10feb16 update] Well boys and girls, the results are in and it looks like Jo Ann, Russ Steele, and I were not all that good in predicting yesterday's NH primary results. Of course, we didn't do much worse than the talking heads on TV, but we did reveal more about our predictions than those pundits ever do. I'd like to see their MABs compared to actual results published as you see ours below.
Actually, we weren't all that bad when it came to the Democrats, but our efforts on the gaggle of Republicans needs a little work. Hopefully when the field winnows a bit going forward, we will do better.
I have added two additional prediction metrics to the MAB likelihood values, the higher values of which indicates you did better - zero values mean that the actual result fell outside your predicted MAB range. The other two metrics are percent error of your most likely (best guess) call, and a normalized error measure in terms of how many of your MAB's standard deviations or sigmas did your absolute error (difference between best guess and actual result) contain. The smaller that value, the better were your MABs. So here's the spreadsheet of the results for Jo Ann, Russ, and me (click on image and then CTRL+ to see a larger version).
To start things off Jo Ann and I took independent cuts at the outcome of the New Hampshire primary next Tuesday. A convenient way to express a bunch of MABs whose realized values have to add up to a certain number is to first pick your best guess or most likely (M) values and make them add up to that certain number – here we took that to be 100% of the vote for each party. Then go back and put in the L/H values that you think bracket your best guess in the smallest reasonable range, and finally review your best guess and write in your confidence C values. So here are our [L, H, M, C] tuples.
JAR
Trump 19%, 26%, 21%, 0.6
Cruz 15, 20, 17, 0.2
Rubio 17, 28, 20, 0.3
Kasich 9, 15, 10, 0.3
Bush 5, 10, 6, 0.3
Christie 8, 12, 9, 0.4
Fiorina 8, 12, 10, 0.4
Carson 5, 10, 6, 0.3
Clinton 20, 40, 30, 0.4
Sanders 40, 80, 70, 0.3
GJR
Trump 25%, 38%, 28%, 0.5
Cruz 15, 25, 20, 0.6
Rubio 15, 20, 16, 0.4
Kasich 5, 12, 8, 0.3
Bush 8, 16, 10, 0.5
Christie 6, 12, 8, 0.3
Fiorina 2, 6, 5, 0.3
Carson 4, 7, 5, 0.3
Clinton 25, 50, 35, 0.5
Sanders 50, 75, 65, 0.6
Posted by: George Rebane | 07 February 2016 at 10:28 PM
George, After listening to the Megan Kelly's interviews, this is by best guess.
RWS
Trump 19%, 26%, 21%, 0.7
Cruz 15%, 20%, 17%, 0.3
Rubio 17%, 28%, 20%, 0.2
Kasich 10%, 15%, 11%, 0.5
Bush 4%, 10%, 5%, 0.2
Christie 8%, 12%, 10%, 0.4
Fiorina 8%, 12%, 10%, 0.3
Carson 3%, 7%, 6%, 0.3
Clinton 20%, 40%, 30%, 0.3
Sanders 40%, 70%, 60%, 0.5
Posted by: Russ Steele | 08 February 2016 at 06:43 PM
RussS 643pm - So noted Russ, good luck ;-)
Posted by: George Rebane | 08 February 2016 at 07:01 PM
How Google Searches Pretty Much Nailed the New Hampshire Primary
Real-time trending search queries foretold the outcome of the election.
Google’s ability to look into the future of political contests just notched another win: New Hampshire.
Searches of presidential candidates conducted by Google users in New Hampshire on Feb. 9 corresponded closely with the voting results of the state’s primary. The top-searched Democratic candidate was Bernie Sanders, who won with 60 percent of the vote in New Hampshire, according to the Associated Press. He got 72 percent of the searches, according to Google, while Hillary Clinton got 28 percent of the queries and 38 percent of the vote.
The top-searched Republican candidate was Donald Trump, who won with 35 percent of the vote. On Google he received 41 percent of the searches an hour before the polls closed, according to the search giant. No. 2 was John Kasich, who got 16 percent of both the vote and the searches. Ted Cruz took third with 12 percent of the vote and 15 percent of the searches. The battle between Jeb Bush and Marco Rubio was close online and in real life. While Bush took fourth place at the polls, winning 11 percent of the vote, online he got just 7 percent of the searches. Meanwhile, Rubio got 10 percent of the searches and only 10.6 percent of the vote.
http://www.bloomberg.com/news/articles/2016-02-11/how-google-searches-pretty-much-nailed-the-new-hampshire-primary
Graphics at the link.
Posted by: Russ Steele | 11 February 2016 at 06:12 PM
RussS 612pm - That's all fine but 1) you can't use their data before the election, i.e. before they publish it, and 2) Google doesn't say anything about YOUR or MY ability to predict.
Posted by: George Rebane | 11 February 2016 at 11:34 PM
[email protected]:34AM
Google has a Real Time Reporting API that will let a smartphone user monitor search terms right up to the time a voter pulls the handle.
Check out this chart at 4PM Feb 9th:
https://www.google.com/trends/story/US_cu_4JIPz1IBAAC93M_en
It appears to me that it is possible to monitor the search terms in real time. Set up the search and then use the Real Time Reporting API to monitor the results.
True, hard to predict days in advance, but it looks like it is possible to predict the outcome before the polls close.
Posted by: Russ Steele | 12 February 2016 at 01:32 PM
RussS 132pm - No doubt you are right. But unless you are trying to win a specific wager while the polls are open, that kind of information is of little use, and most certainly does not reflect on your acumen in predicting. Predictions lose both their utility and panache the closer they are made to their resolving event.
I'll let my 1134pm stand.
Posted by: George Rebane | 12 February 2016 at 02:41 PM