02-09-2023, 04:53 PM
(This post was last modified: 02-12-2023, 01:32 PM by Caleb_H. Edited 3 times in total.)
Hello everyone! Today I'd like to talk about something I've been working on, which is an effort to create an expected points (EP) matrix, which could be used to calculate all sorts of things, but most importantly EPA/play, using recent (S35-present) ISFL scoring environments. I want to preface this with a few things before I get into it, however:
1. This is intended to be a pretty rudimentary start to what I want to be a long-term project. A lot of the data is pretty sloppy for now but it nevertheless should be pretty accurate.
2. I am not at all competent with computers, especially compared to a lot of people around. As a result my data-scraping and organizing work is incredibly sloppy. I know there are about a million things that could make my life easier when I'm looking at all this but I really have very little interest in learning how to code more than what I already know. Sorry. It's just not my interest.
3. As we'll get into later, the timescale I'm using is particularly short. I could make it much longer, and I probably will, but for now five seasons is enough play-by-play data. I fully intend to update my data after every ISFL season, so the sample size should expand over time.
4. Many thanks to @infinitempg, who helped me find and use the play-by-play data. Additional thanks to @Swanty, who offered early support.
What are expected points?
Expected points as a concept were introduced by Virgil Carter, a quarterback for the Cincinnati Bengals at the time, and Robert E. Machol, a professor at Northwestern University in Illinois in a 1970 paper. The idea, as they describe it, is that for every combination of down, distance, and distance-to-endzone, there is some E(X) that describes how many points you would expect an NFL team to score on that drive. I'll try to avoid the specifics but I am using a slightly different method to calculate E(X) than the method used by Machol and Carter. The difference is in how I deal with turnovers, as they not only end possession but give the ball back to the opponent on one of 99 possible locations on the field, which each in turn have their own expected points values for the other team (or 'negative' points for your own team). This results in all sorts of mathematical fuckery. Machol and Carter solve the problem very gracefully, but I am a graceless bitch and have solved the problem in a way I can conceivably code into a spreadsheet on my limited free time. The end result should not be greatly different than if I had done it the right way, however.
The value in expected points is mostly in understanding the points-value of a specific play. In the most basic understanding of football, the plays that score points are only those that result in touchdowns, field goals, or safeties. But this kind of perspective is severely limited; if a quarterback sneaks into the endzone from the 1-yard line, scoring a touchdown, that play should not be given credit for all seven points of that score; indeed, there were likely many plays leading up to that play—a third down conversion, or a long run to enter the redzone, an interception with a lengthy return to give the offense good field position, or a combination of all three—all of which deserve credit for the resulting touchdown. So if we instead say that an offense is already highly likely to score a touchdown if they are on the opponent's 1-yard line, and we define to a great degree of precision how likely they are to score, then we can determine how many the quarterback sneak by itself is worth. In the case of the ISFL, an offense is expected to score about 6.59 points at the 1-yard line on 2nd down, so we can say that a 2nd down quarterback sneak from the 1-yard line for a touchdown is worth about 7-6.59 = 0.41 points, which is a lot for a single play, but not quite the full 7 points you might assign more naïvely.
Note: As Machol and Carter note in their paper, a touchdown is not worth exactly 7 points, but it is an acceptable approximation.
The Results:
So, after doing my reading, and learning how to use a spreadsheet, and about a million other things, I finally put together a complete expected points matrix of every [down, distance, yards-to-endzone] that has occurred at any point in the ISFL from S35-S39.
LINK TO THE SPREADSHEET
Using the matrix, I can look and sort all sorts of lovely data, and I can make a chart such as this:
The chart above tells us how many points an ISFL team is expected to score on both 1st and 10 (blue line) and 2nd and 2 (red line) from any location on the field. The charts do not display the actual expected points but rather a 5-point moving average, which I have applied to mellow out some of the noise that occurs with small sample sizes. As we might expect, the closer you are to the opponents end zone, the more points you are likely to score. As with the actual NFL, the two lines keep a pretty good pace with each other, which I was happy to see, as it means the model is probably working as I had hoped!
In this chart, I have plotted the point expectancy across the field for both 1st and 10 (blue), 2nd and 10 (red), 3rd and 10 (yellow), and 4th and 10 (green). The blue line in this chart is the same as the last, but in this chart we can easily see how expected points added (EPA) might be useful. For instance, if an offense starts on its own 25-yard line after a kickoff touchback, we could say its expected points in that situation are about 0.99 on first down. However, if the quarterback throws an incompletion to bring up 2nd and 10, we would now say that the new expected points is 0.63. Another incompletion would lower the value to -0.65 points on 3rd and 10; the other team is now more likely to score next than the offense currently in possession of the ball. Another incompletion on 3rd down would again lower the value to -2.48; almost every team would punt now, giving the opposition the ball with excellent field position. However, if the offense stays on the field and throws another incompletion, its opponents would enjoy 1st and 10 with only 25 yards to the endzone, which is expected to net them about 3.58 points, or -3.58 points for the team that threw four straight incompletions. So the first incompletion's EPA is 0.63 - 0.99 = -0.36, the second's is -0.65 - 0.63 = -1.28, the third's is -2.48 + 0.65 = -1.83, and finally the fourth incompletion would have an EPA of -3.58 + 2.48 = -1.1. All in all, the offense would be credited with producing a total of -4.57 expected points on their fruitless drive. Not only did they squander a golden scoring opportunity for themselves, they gift-wrapped an opportunity for their opponent's offense to score points of their own.
Conclusion
So. I've now spent a lot of time making a spreadsheet, and I've spent a lot time giving a crash-course lecture into expected points. But why should you care? Well, I'll tell you why you should care: because, like you, I'm also a football fan, and I know that the football fan sees only one thing in mathematics: the power to talk trash. In the interest of serving your basest desires, I will be making a recurring article on this forum during the ISFL season detailing how each team is doing in regards to EPA/play, complete with neat little graphs and charts that you can send to your friends and brag about how Sarasota's defense is the best in the league, actually.
I have plans to implement the following in the months to follow for each team in the ISFL, and I will try to publish my findings regularly:
Offensive EPA/play
Defensive EPA/play
EPA/rush (offense and defense)
EPA/pass (offense and defense)
EPA/punt
I will also try to make a DSFL expected points matrix in the near future, and thus will be able to publish on the DSFL teams as well.
Thanks very much for reading, have a lovely day.
~Jenni
EDIT: I forgot to mention but I filtered out all plays that occurred within 3 minutes of the end of a half, because end-of-half scoring situations are complicated by time restrictions.
1. This is intended to be a pretty rudimentary start to what I want to be a long-term project. A lot of the data is pretty sloppy for now but it nevertheless should be pretty accurate.
2. I am not at all competent with computers, especially compared to a lot of people around. As a result my data-scraping and organizing work is incredibly sloppy. I know there are about a million things that could make my life easier when I'm looking at all this but I really have very little interest in learning how to code more than what I already know. Sorry. It's just not my interest.
3. As we'll get into later, the timescale I'm using is particularly short. I could make it much longer, and I probably will, but for now five seasons is enough play-by-play data. I fully intend to update my data after every ISFL season, so the sample size should expand over time.
4. Many thanks to @infinitempg, who helped me find and use the play-by-play data. Additional thanks to @Swanty, who offered early support.
What are expected points?
Expected points as a concept were introduced by Virgil Carter, a quarterback for the Cincinnati Bengals at the time, and Robert E. Machol, a professor at Northwestern University in Illinois in a 1970 paper. The idea, as they describe it, is that for every combination of down, distance, and distance-to-endzone, there is some E(X) that describes how many points you would expect an NFL team to score on that drive. I'll try to avoid the specifics but I am using a slightly different method to calculate E(X) than the method used by Machol and Carter. The difference is in how I deal with turnovers, as they not only end possession but give the ball back to the opponent on one of 99 possible locations on the field, which each in turn have their own expected points values for the other team (or 'negative' points for your own team). This results in all sorts of mathematical fuckery. Machol and Carter solve the problem very gracefully, but I am a graceless bitch and have solved the problem in a way I can conceivably code into a spreadsheet on my limited free time. The end result should not be greatly different than if I had done it the right way, however.
The value in expected points is mostly in understanding the points-value of a specific play. In the most basic understanding of football, the plays that score points are only those that result in touchdowns, field goals, or safeties. But this kind of perspective is severely limited; if a quarterback sneaks into the endzone from the 1-yard line, scoring a touchdown, that play should not be given credit for all seven points of that score; indeed, there were likely many plays leading up to that play—a third down conversion, or a long run to enter the redzone, an interception with a lengthy return to give the offense good field position, or a combination of all three—all of which deserve credit for the resulting touchdown. So if we instead say that an offense is already highly likely to score a touchdown if they are on the opponent's 1-yard line, and we define to a great degree of precision how likely they are to score, then we can determine how many the quarterback sneak by itself is worth. In the case of the ISFL, an offense is expected to score about 6.59 points at the 1-yard line on 2nd down, so we can say that a 2nd down quarterback sneak from the 1-yard line for a touchdown is worth about 7-6.59 = 0.41 points, which is a lot for a single play, but not quite the full 7 points you might assign more naïvely.
Note: As Machol and Carter note in their paper, a touchdown is not worth exactly 7 points, but it is an acceptable approximation.
The Results:
So, after doing my reading, and learning how to use a spreadsheet, and about a million other things, I finally put together a complete expected points matrix of every [down, distance, yards-to-endzone] that has occurred at any point in the ISFL from S35-S39.
LINK TO THE SPREADSHEET
Using the matrix, I can look and sort all sorts of lovely data, and I can make a chart such as this:
The chart above tells us how many points an ISFL team is expected to score on both 1st and 10 (blue line) and 2nd and 2 (red line) from any location on the field. The charts do not display the actual expected points but rather a 5-point moving average, which I have applied to mellow out some of the noise that occurs with small sample sizes. As we might expect, the closer you are to the opponents end zone, the more points you are likely to score. As with the actual NFL, the two lines keep a pretty good pace with each other, which I was happy to see, as it means the model is probably working as I had hoped!
In this chart, I have plotted the point expectancy across the field for both 1st and 10 (blue), 2nd and 10 (red), 3rd and 10 (yellow), and 4th and 10 (green). The blue line in this chart is the same as the last, but in this chart we can easily see how expected points added (EPA) might be useful. For instance, if an offense starts on its own 25-yard line after a kickoff touchback, we could say its expected points in that situation are about 0.99 on first down. However, if the quarterback throws an incompletion to bring up 2nd and 10, we would now say that the new expected points is 0.63. Another incompletion would lower the value to -0.65 points on 3rd and 10; the other team is now more likely to score next than the offense currently in possession of the ball. Another incompletion on 3rd down would again lower the value to -2.48; almost every team would punt now, giving the opposition the ball with excellent field position. However, if the offense stays on the field and throws another incompletion, its opponents would enjoy 1st and 10 with only 25 yards to the endzone, which is expected to net them about 3.58 points, or -3.58 points for the team that threw four straight incompletions. So the first incompletion's EPA is 0.63 - 0.99 = -0.36, the second's is -0.65 - 0.63 = -1.28, the third's is -2.48 + 0.65 = -1.83, and finally the fourth incompletion would have an EPA of -3.58 + 2.48 = -1.1. All in all, the offense would be credited with producing a total of -4.57 expected points on their fruitless drive. Not only did they squander a golden scoring opportunity for themselves, they gift-wrapped an opportunity for their opponent's offense to score points of their own.
Conclusion
So. I've now spent a lot of time making a spreadsheet, and I've spent a lot time giving a crash-course lecture into expected points. But why should you care? Well, I'll tell you why you should care: because, like you, I'm also a football fan, and I know that the football fan sees only one thing in mathematics: the power to talk trash. In the interest of serving your basest desires, I will be making a recurring article on this forum during the ISFL season detailing how each team is doing in regards to EPA/play, complete with neat little graphs and charts that you can send to your friends and brag about how Sarasota's defense is the best in the league, actually.
I have plans to implement the following in the months to follow for each team in the ISFL, and I will try to publish my findings regularly:
Offensive EPA/play
Defensive EPA/play
EPA/rush (offense and defense)
EPA/pass (offense and defense)
EPA/punt
I will also try to make a DSFL expected points matrix in the near future, and thus will be able to publish on the DSFL teams as well.
Thanks very much for reading, have a lovely day.
~Jenni
EDIT: I forgot to mention but I filtered out all plays that occurred within 3 minutes of the end of a half, because end-of-half scoring situations are complicated by time restrictions.