@IthicaHawk has been putting together an excellent series where, after putting together an Elo ranking system for the league, has been using it to predict the results of each game this season. It is great work, and you should check out his Week 4 predictions HERE!
But anyways, I figured I'd put his models to the test. Ithica's models are created based on probabilities created from these Elo rankings, but ultimately the sim cares not about your fancy maths and logic. Also, and probably more importantly, the Elo rankings do not account for things like roster changes, depth chart adjustments, and strategy alterations. Even a small change in tempo can have significant changes in your odds to the sim, and these odds can be pried out of it by running a bunch of test sims.
Armed with the post-game files, I ran 200 test sims of each game to attempt to gauge what the sim thought each team's odds should be with updated strats/depth charts. Here are the results:
The first thing to note is that overall, the Elo model is pretty successful at predicting the correct result. Ithica has a 72% hit rate (W2's YKW/SAR game predicted YKW, but had YKW as the away team - we'll give it to him). The sim tests, on the other hand, go 78% with its predictions. The biggest upset (SAR @ BAL in W1) was missed by both. Notably, most of the significant variations happened in W3! Let's take a closer look.
If we look at the biggest variations, we see that the common threads are Philly and San Jose, who have both made improvements relative to the field this past offseason. Because the Elo rankings don't take into account much in terms of offseason activities, both get severely underestimated. In two fo the three weeks, the Elo rankings underestimate Philly by 30%, while SJS is underestimated by 11% in week 1 and 28% in weeks 2 and 3. The reverse goes for Baltimore, who lost Condominium Hamburger Corgi Happiness Covariant Hallucinogen Corvo Havran to expansion. We see an overestimation by 11% in W2 and an underestimation of nearly 31% (though this combines with the Philly bias). Chicago appears to have the same situation having lost players like Ryan Leaf Jr. and Ahri Espeeyeeseetee this offseason (again combining with Philly's bias in W2). Lastly, the expansion teams get the benefit of starting at an Elo Rating of 1500 despite arguably not deserving that. Sarasota especially is bad, but the higher initial point forces the system to overestimate them by 15% in W3.
With this test sim data, we can actually go and analyze the games themselves and see how the real events compare to what the average outcome would be. Here is a table:
The sim lines and over/under are set by the average point differences/sums from each team. Interestingly, we see that 39% of the time the sim has gone with the underdog. Whether or not this continues to be the trend remains to be seen, but that would have interesting impacts on how people bet in the future. The O/U has similar results, with the Over only being hit 33% of the time. This should be able to be chalked out to coincidence, but it'll be interesting to see how this follows for the rest of the season.
Finally, some sim gods stuff:
Baltimore, New Orleans, and Austin found themselves upset once each, while Orange County did the most upsetting with 2 upset wins. YKW, SJS, and CHI can probably say they are currently on their most likely path, while BAL and OCO have gone on very unlikely paths. Will this continue to stand? Will BAL continue to be unlucky? Will OCO continue to be lucky? Time will tell.
But anyways, I figured I'd put his models to the test. Ithica's models are created based on probabilities created from these Elo rankings, but ultimately the sim cares not about your fancy maths and logic. Also, and probably more importantly, the Elo rankings do not account for things like roster changes, depth chart adjustments, and strategy alterations. Even a small change in tempo can have significant changes in your odds to the sim, and these odds can be pried out of it by running a bunch of test sims.
Armed with the post-game files, I ran 200 test sims of each game to attempt to gauge what the sim thought each team's odds should be with updated strats/depth charts. Here are the results:
The first thing to note is that overall, the Elo model is pretty successful at predicting the correct result. Ithica has a 72% hit rate (W2's YKW/SAR game predicted YKW, but had YKW as the away team - we'll give it to him). The sim tests, on the other hand, go 78% with its predictions. The biggest upset (SAR @ BAL in W1) was missed by both. Notably, most of the significant variations happened in W3! Let's take a closer look.
If we look at the biggest variations, we see that the common threads are Philly and San Jose, who have both made improvements relative to the field this past offseason. Because the Elo rankings don't take into account much in terms of offseason activities, both get severely underestimated. In two fo the three weeks, the Elo rankings underestimate Philly by 30%, while SJS is underestimated by 11% in week 1 and 28% in weeks 2 and 3. The reverse goes for Baltimore, who lost Condominium Hamburger Corgi Happiness Covariant Hallucinogen Corvo Havran to expansion. We see an overestimation by 11% in W2 and an underestimation of nearly 31% (though this combines with the Philly bias). Chicago appears to have the same situation having lost players like Ryan Leaf Jr. and Ahri Espeeyeeseetee this offseason (again combining with Philly's bias in W2). Lastly, the expansion teams get the benefit of starting at an Elo Rating of 1500 despite arguably not deserving that. Sarasota especially is bad, but the higher initial point forces the system to overestimate them by 15% in W3.
With this test sim data, we can actually go and analyze the games themselves and see how the real events compare to what the average outcome would be. Here is a table:
The sim lines and over/under are set by the average point differences/sums from each team. Interestingly, we see that 39% of the time the sim has gone with the underdog. Whether or not this continues to be the trend remains to be seen, but that would have interesting impacts on how people bet in the future. The O/U has similar results, with the Over only being hit 33% of the time. This should be able to be chalked out to coincidence, but it'll be interesting to see how this follows for the rest of the season.
Finally, some sim gods stuff:
Baltimore, New Orleans, and Austin found themselves upset once each, while Orange County did the most upsetting with 2 upset wins. YKW, SJS, and CHI can probably say they are currently on their most likely path, while BAL and OCO have gone on very unlikely paths. Will this continue to stand? Will BAL continue to be unlucky? Will OCO continue to be lucky? Time will tell.
Code:
619 words, 3600 test sims :)