Polls, Pundits, or Prediction Markets: An assessment of election forecasting

Harry   Crane

Harry Crane

election predictions forecasting proper scoring rule fivethirtyeight political science prediction market Brier score calibration accuracy Kelly criterion

Abstract

I compare forecasts of the 2018 U.S. midterm elections based on (i) probabilistic predictions posted on the FiveThirtyEight blog and (ii) prediction market prices on PredictIt.com. Based on empirical forecast and price data collected prior to the election, the analysis assesses the calibration and accuracy according to Brier and logarithmic scoring rules. I also analyze the performance of a strategy that invests in PredictIt based on the FiveThirtyEight forecasts.

Versions

➤ Version 1 (2018-11-06)

Citations

Harry Crane (2018). Polls, Pundits, or Prediction Markets: An assessment of election forecasting. Researchers.One. https://researchers.one/articles/18.11.00005v1

Reviews & Substantive Comments

1 Comment

Simon MorrisMay 7th, 2020 at 05:24 pm
This is more a comment on future plans to compare 538 to Prediction Markets than it is on this paper. (Original thoughts from thread here: https://twitter.com/SmoLurks/status/1254033220968030209)

On treating 538 as a bookie rather than a bettor.
- Let the model trade with some vig. Say trade Kelly + 1.5%? (Even the market is trading with at least 0.5% width). I'm not particularly bothered about treating 538 as a bookie rather than a trader since they are trying to put odds on everything and we're trying to decide which is better between them and the market.
- Allow them to "trade" continuously. If the market prices swing wildly, 538 should have the chance to fade them (or have their face ripped off being slow to react to news).
- By trading at PredictIt prices, 538 are already getting a significant advantage over if they had to eat their own cooking. (You could run the equivalent system betting the market against 538 odds if you really wanted to penalise 538).

On 538 driving the market:
- For 538 moving prices, I would let 538 trade at odds recorded just before they publish their estimates. (ie if their model prints a new output at 1700, they get to trade at prices from 1659 to prevent front-running).

On not enough bets:
- I don't think we can solve completely. (But I have some thoughts about how we might "generate" more bets). I think the only real response to this is to pre-acknowledge the limitations from randomness and move on. If the market or 538 is much, much better, we might get enough data straight off. I'd have to do a bit of math to figure out what the thresholds might be (ideally before the results come in). From experience with betting systems, usually a very small number of bets can tell you if your model is much, much worse than the market. (Sadly, I've yet to have the reverse situation).

On correlation between bets:
- I think it's really tricky.
- In an ideal world, Nate Silver would allow researchers access to his simulation results, and we could back out all the model's conditional probabilities to run a proper Kelly staking regime. (Throw all the markets into the same optimization, using the same probabilities, and bet on all of the quotes it recommends).

- The question really is, is it possible to back out conditional probabilities without full access to the model. Or at least estimate those conditional probabilities in the most model independent way.

- (And of course, to make it doubly hard, needing to come up with a methodology for this before 538 start publishing the model and letting us see what data they provide!)

- My first instinct would be to try and fit some kind of hierarchical model to his model outputs. (Probably with national estimate and state estimates as levels). I don't like this a huge amount, since there's a risk of our model polluting the system and fitting models to models tends to be fairly unstable. That said:

1/ I don't think it will pollute the system too much (as long as the model is vaguely sensible) we're going to be reducing the amount of staking we're doing, which can only be a good thing.

2/ We have to make these estimates somehow, and using 538's figures is the fairest(?) way to do that.

- My more traderish view would be to run several "books".
1/ An overall book, which would trade R vs D for pres.
2/ Senate and house books, hedged against the overall book for R/D lean. (Estimating the hedge here is a v. similar problem to modelling mentioned earlier unfortunately)
3/ State books, again, hedged against overall book
4/ Senate and Reps books (hedged against state books)

My issue with the "books" approach, is that you're likely not to capture all the correlation that 538's model might use. (ie if 538 thinks two states will vote the same way, you might end up doubling the risk they would want to take)

You also need some way to come up with weights, and I see this as eventually reducing down to the modelling approach I suggested first.

- Cracking this problem would be extremely valuable though. Once you've managed to make each bet "independent" your space of bets is much larger. (Maybe 500+ markets with a decent amount of opportunity to "trade" around those bets over the time period to the election?). Without doing any math, I'd be reasonable confident this would be enough to give a very good estimate for how to compare 538 to the market.

One further thought which I think would be interesting to think about, is separating the "time series" approach of 538 to their "actual numbers". I have a suspicion that their estimates are too noisy, and they will bleed cash to the market just from jumping around too much. They should definitely be punished for this in any kind of analysis, but it would be interesting to separate out this effect, so that if they do "generally" have edge just they mess it up without smoothing well enough they get credit for that.

Add to the conversation