05-16-2023, 01:22 PM
(This post was last modified: 05-18-2023, 02:10 PM by xenosthelegend. Edited 4 times in total.)
Hi everyone! If you remember a few months ago I put together a very large spreadsheet that would allow me to calculate the expected points value of a play, and thus allow me to calculate things like EPA/play for a team or skill player. I'm very proud of that work, as I really didn't know the first thing about computers or spreadsheets or anything at the time I made it, but unfortunately the limitations of using a spreadsheet, and the intensive work required to update it (not to mention the inflexibility of the model should I decide to change up the calculations), means that it was time to discontinue work on the spreadsheet. Instead I've chosen to move over the entirety of my calculations onto Python, which addresses almost every issue I was having with the spreadsheet, and will hopefully allow me to do some pretty cool stuff in the eventual future, if that's the sort of thing people were interested in.
However, the spreadsheet had one feature, which I called a "gizmo" (found here, but also super outdated), which allowed people to look up an individual player or team's EPA/play without needing to ask me directly, which unfortunately I cannot recreate in Python with my limited experience and understanding. I hope to eventually find a way for people to directly look up a player's EPA figures, or a way to sort the data using their own custom parameters (i.e. compare one player's EPA/play throughout their career or whatnot), but for the time being I want to address this limitation by simply dumping the sum total of nearly every relevant EPA statistic onto a media in the form of a chart. I have about 120 charts to share, so while I'll try to make broad observations on the eventual data, I can't really go into detail because there's simply too many charts and I'd start to sound like a broken record. Also, nobody would want to read that. Before I get start dumping data, I want to clarify some important changes I've made to my calculations. If you are curious about what the fuck I'm talking about I went into some detail in my first article on expected points HERE, as well as a follow-up article found HERE. Those articles are both a little old at this point and I think I've gained a good number of insights into what EPA means and how it can be used, but they mostly go into good-enough detail that I don't think it's worth it to go back into the entire run-down. If you have any questions about anything about expected points I would really appreciate it if you just dropped a reply to this thread or asked me on discord or whatever, as obviously I've put a lot of time into this and would love to spend more time talking about it. I'll mention the important bit here again though:
Expected points is a metric used to determine the favorability of a team's position on the field. That's it. It takes three numbers, the down, the distance needed for a first down, and the distance to the end zone, and tells us how many points the average team is expected to score from that position on that drive. Sometimes, the expected points of a situation will be negative, because a team is expected to punt, and give the ball to the other team. My model does not take into account things like available time outs and how much time is left until the half, and to be fair, neither do most IRL NFL EPA models. Nobody wants to think about those things.
Expected points added (EPA) just tells us how much a team is performing relative to the average. In very simplified terms, if a team scores a touchdown from a situation they'd be expected to score a field goal in, that offense would be awarded 4 "expected points added" over the course of that drive. Conversely a team that settles for a field goal after getting to, say, 1st and goal from the 1-yard line, would be awarded -4 EPA. Thus good offenses (relative to average) should have a positive EPA/play, and bad offenses will have a negative EPA/play. For defenses, things are reversed, because a good defense will reduce the number of points a team is scoring.
Okay! On to my changes: Firstly, I grouped together the "distance to goal" part of my calculations into little 3-yard buckets, in order to increase the sample size for each distance from the endzone. All this means is that my calculations don't discriminate between 1st and 10 at the 24 yard line, 1st and 10 from the 23 yard line, and 1st and 10 from the 25 yard line. There aren't any meaningful consequences to this, and in fact the original paper detailing expected points and how to calculate it actually groups distance-to-goal's into much larger 10 yard chunks to account for their very low sample size.
Additionally, I changed the calculations to only account for the 7 closest seasons of plays rather than the entire S27-present. So if we are calculating EPA in S28 we will use a different EP matrix than if we are calculating EPA in S40. This drastically reduces our sample size, but it also allows a degree of adjustment for changes in scoring environment. This should cause each of the charts to look a lot more balanced around (0,0), but it also re-introduces some of that noise I was trying to eliminate earlier.
For reference, here is a chart of the expected points on X and 10, using the old methods (taking all of S27-S40 into account):
And here is a chart for S30, taking the seasons S27-S33 into account:
And finally, here is the same chart but for S39, taking S34-S40 into account:
You can see that the two new plots are a good bit more noisy than the first plot, which is to be expected when reducing the sample size. However I'm really not sure why it seems like the S39 plot is a good bit more noisy than the S30 plot. I suppose there might be some noise in the noise! However, the new charts only rarely critically break and have 2nd and 10 have a higher expected point value than 1st and 10 (which realistically should never happen), for instance, so overall I'm happy with the new results. If you had been paying attention to my recurring releases of team EPA charts you'd notice that a number of charts would have almost every team clustered on one quadrant of the chart and this should help to address that.
Okay! With that out of the way I'd love to move onto the season-by-season data, which will come with minimal commentary. However before I get into this, I'd like to clarify something in regards to the individual charts. Unfortunately my methods for determining if a player is involved in a play are very limited, and realistically the only thing I can check for is if a player's last name appears in a play's play-by-play in-game commentary (what you see at the bottom of the screen during a game.) Unfortunately there are some serious problems that come up when doing this, mostly because some players have the same names as others. There are some real problems that could arise from the McTurtles, for example, and while I have done my best to limit these issues I cannot make them go away entirely. There are also some problems that come up due to my automation of the individual charts-making process, namely in that I have no way of determining whether or not a player has been traded part-way through a season. I am drawing from the league index here to generate a list of names and their teams, so I'm a little bit beholden to Wolverine Studios' indexing decisions here. There's also a small visual bug that occurs when two players with the same last name who play the same position appear in the same season. You will understand when you see it but importantly the data that appears from this bug should be perfectly accurate, but nevertheless it will look a little strange. Sorry! Anyways, if a player was traded mid-season you can ask me and I can give the exact splits for their EPA data.
I've reduced the sizes of each chart so that this post doesn't take too long to scroll through. If you want to see the full-size charts you should be able to open the image in a new tab and it will hopefully be a good resolution.
One last thing: I have removed all punting EPA data for the time being because it's way way way too noisy. Unfortunately I don't think it's worth posting it as if it has any significant meaning. That being said I still have these charts so if you're curious I can send them to you on request.
Season 27
Season 28
Season 29
*Please note: in the S29 RB receiving chart, due to a trade, Ayers is listed here despite appearing in only 6 passing plays with the Wraiths. Thus their outrageous numbers here can be wholly discarded. Sorry!
Season 30
Season 31
Season 32
Season 33
Season 34
Season 35
Season 36
Season 37
Season 38
Season 39
Season 40
Okay! Please let me know if you have any questions or have noticed any issues with what I've posted here. I've put a lot of work into this and I'd really love some feedback! Have a great day!
Edit: I'd be remiss to not thank @jeffie43 for giving me a quick introduction to R, and the skeleton code I ended up modifying to make the current charts. I'd also like to thank @infinitempg for helping me learn how to work with data frames in pandas. Thank you both!
~Jenni
However, the spreadsheet had one feature, which I called a "gizmo" (found here, but also super outdated), which allowed people to look up an individual player or team's EPA/play without needing to ask me directly, which unfortunately I cannot recreate in Python with my limited experience and understanding. I hope to eventually find a way for people to directly look up a player's EPA figures, or a way to sort the data using their own custom parameters (i.e. compare one player's EPA/play throughout their career or whatnot), but for the time being I want to address this limitation by simply dumping the sum total of nearly every relevant EPA statistic onto a media in the form of a chart. I have about 120 charts to share, so while I'll try to make broad observations on the eventual data, I can't really go into detail because there's simply too many charts and I'd start to sound like a broken record. Also, nobody would want to read that. Before I get start dumping data, I want to clarify some important changes I've made to my calculations. If you are curious about what the fuck I'm talking about I went into some detail in my first article on expected points HERE, as well as a follow-up article found HERE. Those articles are both a little old at this point and I think I've gained a good number of insights into what EPA means and how it can be used, but they mostly go into good-enough detail that I don't think it's worth it to go back into the entire run-down. If you have any questions about anything about expected points I would really appreciate it if you just dropped a reply to this thread or asked me on discord or whatever, as obviously I've put a lot of time into this and would love to spend more time talking about it. I'll mention the important bit here again though:
Expected points is a metric used to determine the favorability of a team's position on the field. That's it. It takes three numbers, the down, the distance needed for a first down, and the distance to the end zone, and tells us how many points the average team is expected to score from that position on that drive. Sometimes, the expected points of a situation will be negative, because a team is expected to punt, and give the ball to the other team. My model does not take into account things like available time outs and how much time is left until the half, and to be fair, neither do most IRL NFL EPA models. Nobody wants to think about those things.
Expected points added (EPA) just tells us how much a team is performing relative to the average. In very simplified terms, if a team scores a touchdown from a situation they'd be expected to score a field goal in, that offense would be awarded 4 "expected points added" over the course of that drive. Conversely a team that settles for a field goal after getting to, say, 1st and goal from the 1-yard line, would be awarded -4 EPA. Thus good offenses (relative to average) should have a positive EPA/play, and bad offenses will have a negative EPA/play. For defenses, things are reversed, because a good defense will reduce the number of points a team is scoring.
Okay! On to my changes: Firstly, I grouped together the "distance to goal" part of my calculations into little 3-yard buckets, in order to increase the sample size for each distance from the endzone. All this means is that my calculations don't discriminate between 1st and 10 at the 24 yard line, 1st and 10 from the 23 yard line, and 1st and 10 from the 25 yard line. There aren't any meaningful consequences to this, and in fact the original paper detailing expected points and how to calculate it actually groups distance-to-goal's into much larger 10 yard chunks to account for their very low sample size.
Additionally, I changed the calculations to only account for the 7 closest seasons of plays rather than the entire S27-present. So if we are calculating EPA in S28 we will use a different EP matrix than if we are calculating EPA in S40. This drastically reduces our sample size, but it also allows a degree of adjustment for changes in scoring environment. This should cause each of the charts to look a lot more balanced around (0,0), but it also re-introduces some of that noise I was trying to eliminate earlier.
For reference, here is a chart of the expected points on X and 10, using the old methods (taking all of S27-S40 into account):
And here is a chart for S30, taking the seasons S27-S33 into account:
And finally, here is the same chart but for S39, taking S34-S40 into account:
You can see that the two new plots are a good bit more noisy than the first plot, which is to be expected when reducing the sample size. However I'm really not sure why it seems like the S39 plot is a good bit more noisy than the S30 plot. I suppose there might be some noise in the noise! However, the new charts only rarely critically break and have 2nd and 10 have a higher expected point value than 1st and 10 (which realistically should never happen), for instance, so overall I'm happy with the new results. If you had been paying attention to my recurring releases of team EPA charts you'd notice that a number of charts would have almost every team clustered on one quadrant of the chart and this should help to address that.
Okay! With that out of the way I'd love to move onto the season-by-season data, which will come with minimal commentary. However before I get into this, I'd like to clarify something in regards to the individual charts. Unfortunately my methods for determining if a player is involved in a play are very limited, and realistically the only thing I can check for is if a player's last name appears in a play's play-by-play in-game commentary (what you see at the bottom of the screen during a game.) Unfortunately there are some serious problems that come up when doing this, mostly because some players have the same names as others. There are some real problems that could arise from the McTurtles, for example, and while I have done my best to limit these issues I cannot make them go away entirely. There are also some problems that come up due to my automation of the individual charts-making process, namely in that I have no way of determining whether or not a player has been traded part-way through a season. I am drawing from the league index here to generate a list of names and their teams, so I'm a little bit beholden to Wolverine Studios' indexing decisions here. There's also a small visual bug that occurs when two players with the same last name who play the same position appear in the same season. You will understand when you see it but importantly the data that appears from this bug should be perfectly accurate, but nevertheless it will look a little strange. Sorry! Anyways, if a player was traded mid-season you can ask me and I can give the exact splits for their EPA data.
I've reduced the sizes of each chart so that this post doesn't take too long to scroll through. If you want to see the full-size charts you should be able to open the image in a new tab and it will hopefully be a good resolution.
One last thing: I have removed all punting EPA data for the time being because it's way way way too noisy. Unfortunately I don't think it's worth posting it as if it has any significant meaning. That being said I still have these charts so if you're curious I can send them to you on request.
Season 27
Season 28
Season 29
*Please note: in the S29 RB receiving chart, due to a trade, Ayers is listed here despite appearing in only 6 passing plays with the Wraiths. Thus their outrageous numbers here can be wholly discarded. Sorry!
Season 30
Season 31
Season 32
Season 33
Season 34
Season 35
Season 36
Season 37
Season 38
Season 39
Season 40
Okay! Please let me know if you have any questions or have noticed any issues with what I've posted here. I've put a lot of work into this and I'd really love some feedback! Have a great day!
Edit: I'd be remiss to not thank @jeffie43 for giving me a quick introduction to R, and the skeleton code I ended up modifying to make the current charts. I'd also like to thank @infinitempg for helping me learn how to work with data frames in pandas. Thank you both!
~Jenni