01-27-2024, 01:39 PM
(This post was last modified: 03-23-2024, 10:58 PM by ZodiacEXE. Edited 2 times in total.
Edit Reason: Fixing image links
)
It's no secret that the offensive line is often one of the most important parts of a football team, yet somehow it often doesn't get the credit it deserves. The o-line is woefully underrepresented in stats both in real life and in the ISFL. We only get pancakes and sacks allowed, while the best NFL equivalent is PFF grades which are stated upfront as being fairly subjective. Even our stats are skewed, as there's some correlation between pass-heavy teams and an OL that collects pancakes and allows sacks. So how can we grade an offensive line unit more effectively?
There's an obscure stat used in college football called "line yards." Essentially, it tries to assign yardage to an OL unit by giving them a portion of the yards gained on a given rushing attempt using the following scale:
Negative yardage: 120% of the yards added to line yards
Every yard gained up to 4: 100% of the yards
Every further yard gained up to 10: 50% of the yards
Every yard gained beyond 10: 0% of the yards
The idea is to strike a balance between o-line effectiveness and the rusher's ability. It's obscure and not used many places likely because it's very simplistic. However, the main reason I can think of for it not to be used in the ISFL is that you would need to pull and work with every rushing attempt in a season to get these numbers. Since there's no easy way to pull that data, somebody would have to go through every play-by-play of the previous season and manually enter the yardage data for every single run play before they can get an accurate number. Surely nobody is crazy enough to do that, right?
Welcome to what I spent the evening of January 26th on. 4,754 listed rushing plays, with some room for error since it was all hand-done and by the end my eyes were glazing over everytime I saw the words "Rush by [player] for short gain." But it was all in the name of science, so let's take a look at some of the combined data!
First, the potential inaccuracies. The rushing attempt numbers and total rushing yards are slightly different from what's listed in the index. All of the numbers trend the same way (the attempts are slightly lower, the yards slightly higher), so I can offer some explanation there beyond human error. Many plays ended up nullified by penalties, but it seems it's still considered a rushing attempt if the playcall was run. Therefore, the play-by-play has fewer rush attempts than the index. Yards is a bit trickier, as my numbers tended to be slightly higher than the index. My guess is that the play-by-play doesn't do a great job of describing yardage, potentially even rounding down from half-yard gains. You'll also notice OL TPE numbers highlighted red. This is data I've pulled (and even assumed) from this coming season. I couldn't find a way to get the TPE numbers easily, and having already spent hours combing over play-by-play data I wasn't going to calculate the approximate TPE of units based on data from last season's file.
With all that out of the way, let's talk number. Being a very run-heavy team, Arizona led the way in raw yardage. However, they ranked only 9th in line yards per attempt. It seems a few tiers emerged in terms of efficiency.
Tier 1 was New York. The Silverbacks OL was responsible for nearly 0.2 yards per attempt more than any other unit in the league. Tier 2 runs from Sarasota's 2.963 LY/A to Austin's 2.916. The two teams between them, Berlin and San Jose, tended to be below average in total yards per attempt, so it seems their o-lines did a lot of work for them. The next tier starts at Cape Town, who was over 0.1 LY/A fewer than Austin. From there finding another natural break is tough, but we'll put the final tier starting at Colorado's 2.702 LY/A. Baltimore's line struggled the most, nearly pushing it into a tier all by itself.
So what conclusions can we draw from this? First, let's take a look at some regression analyses to see if there's any correlation between some of these stats. It's a small set, I know, but hopefully we'll find something.
For those unfamiliar, the number we're looking at is the "R square" value near the bottom. Basically the closer it is to 1.000, the better correlation there is between the two values. As you can see, the R^2 is very low. Though perhaps that has more to do with comparing a rate stat to a raw number?
A better R^2 to be sure, but a negative correlation. This has something to do with the fact that pancakes are more common on pass attempts, so teams with a higher proportion of rush attempts will necessarily have fewer pancakes.
Now this is where my concern is. A slight positive correlation, but an R^2 of on .15; pretty close to the R^2 of line yards and pancakes, which we could well explain with a different set of stats. There are a few theories I have as for why LY/A and OL Unit TPE don't correlate very well.
1. Lack of opponent adjustment. 14 teams each playing 16 games should give a good sample size, but some teams simply don't play each other. If an average team faces a good run defense more often, it makes their stats look worse unless we account for their schedule.
2. Bad TPE data. As I said before, these are TPE values from this season instead of last season. Maybe the gap between them is a lot bigger than I initially thought.
3. Line yards is a bad stat. Maybe there's a good reason line yards isn't used. Maybe it just doesn't do a good job of describing an o-line unit's contribution to a team's run game.
4. RB TPE data. This is the most likely culprit in my opinion. I didn't go get running back TPE because I know a lot has changed on that front, but the skill of the RB probably does contribute to line yards more than the metric would have us believe (so roll #3 in as part of this). To show my point, I have one more regression analysis.
The most correlated data by far. I have no data to back this up, but I would guess the teams above the line of best fit would be the teams with great backfields. The bottom line is that I don't think I've solved the problem of not knowing how valuable the offensive line positions are.
You may be asking, "Zodiac you just wasted an entire evening combing through data that you couldn't even use to fulfill your stated goal. Aren't you upset?" Of course not! Well a little. A lot, actually. BUT! I did learn a few things by doing this whole project, like how different teams used their players against different opponents and in different game situations (and sometimes how the sim made terrible, awful decisions in those situations). I got to take a closer look at every game that happened in the S45 regular season. I'm also hoping that having all of this data might be useful to somebody else wanting to look into something using individual rushing attempts (I'll put the link to the spreadsheet at the end of the article). In short, yeah I'm kind of upset my theory didn't hold water, but I don't regret the time I spent working on this...
...or maybe that's just the sunk cost fallacy working its magic. Either way, thanks for reading!
Link to the rushing yards spreadsheet, make a copy if you're interested in messing with the data :)
There's an obscure stat used in college football called "line yards." Essentially, it tries to assign yardage to an OL unit by giving them a portion of the yards gained on a given rushing attempt using the following scale:
Negative yardage: 120% of the yards added to line yards
Every yard gained up to 4: 100% of the yards
Every further yard gained up to 10: 50% of the yards
Every yard gained beyond 10: 0% of the yards
The idea is to strike a balance between o-line effectiveness and the rusher's ability. It's obscure and not used many places likely because it's very simplistic. However, the main reason I can think of for it not to be used in the ISFL is that you would need to pull and work with every rushing attempt in a season to get these numbers. Since there's no easy way to pull that data, somebody would have to go through every play-by-play of the previous season and manually enter the yardage data for every single run play before they can get an accurate number. Surely nobody is crazy enough to do that, right?
Welcome to what I spent the evening of January 26th on. 4,754 listed rushing plays, with some room for error since it was all hand-done and by the end my eyes were glazing over everytime I saw the words "Rush by [player] for short gain." But it was all in the name of science, so let's take a look at some of the combined data!
First, the potential inaccuracies. The rushing attempt numbers and total rushing yards are slightly different from what's listed in the index. All of the numbers trend the same way (the attempts are slightly lower, the yards slightly higher), so I can offer some explanation there beyond human error. Many plays ended up nullified by penalties, but it seems it's still considered a rushing attempt if the playcall was run. Therefore, the play-by-play has fewer rush attempts than the index. Yards is a bit trickier, as my numbers tended to be slightly higher than the index. My guess is that the play-by-play doesn't do a great job of describing yardage, potentially even rounding down from half-yard gains. You'll also notice OL TPE numbers highlighted red. This is data I've pulled (and even assumed) from this coming season. I couldn't find a way to get the TPE numbers easily, and having already spent hours combing over play-by-play data I wasn't going to calculate the approximate TPE of units based on data from last season's file.
With all that out of the way, let's talk number. Being a very run-heavy team, Arizona led the way in raw yardage. However, they ranked only 9th in line yards per attempt. It seems a few tiers emerged in terms of efficiency.
Tier 1 was New York. The Silverbacks OL was responsible for nearly 0.2 yards per attempt more than any other unit in the league. Tier 2 runs from Sarasota's 2.963 LY/A to Austin's 2.916. The two teams between them, Berlin and San Jose, tended to be below average in total yards per attempt, so it seems their o-lines did a lot of work for them. The next tier starts at Cape Town, who was over 0.1 LY/A fewer than Austin. From there finding another natural break is tough, but we'll put the final tier starting at Colorado's 2.702 LY/A. Baltimore's line struggled the most, nearly pushing it into a tier all by itself.
So what conclusions can we draw from this? First, let's take a look at some regression analyses to see if there's any correlation between some of these stats. It's a small set, I know, but hopefully we'll find something.
For those unfamiliar, the number we're looking at is the "R square" value near the bottom. Basically the closer it is to 1.000, the better correlation there is between the two values. As you can see, the R^2 is very low. Though perhaps that has more to do with comparing a rate stat to a raw number?
A better R^2 to be sure, but a negative correlation. This has something to do with the fact that pancakes are more common on pass attempts, so teams with a higher proportion of rush attempts will necessarily have fewer pancakes.
Now this is where my concern is. A slight positive correlation, but an R^2 of on .15; pretty close to the R^2 of line yards and pancakes, which we could well explain with a different set of stats. There are a few theories I have as for why LY/A and OL Unit TPE don't correlate very well.
1. Lack of opponent adjustment. 14 teams each playing 16 games should give a good sample size, but some teams simply don't play each other. If an average team faces a good run defense more often, it makes their stats look worse unless we account for their schedule.
2. Bad TPE data. As I said before, these are TPE values from this season instead of last season. Maybe the gap between them is a lot bigger than I initially thought.
3. Line yards is a bad stat. Maybe there's a good reason line yards isn't used. Maybe it just doesn't do a good job of describing an o-line unit's contribution to a team's run game.
4. RB TPE data. This is the most likely culprit in my opinion. I didn't go get running back TPE because I know a lot has changed on that front, but the skill of the RB probably does contribute to line yards more than the metric would have us believe (so roll #3 in as part of this). To show my point, I have one more regression analysis.
The most correlated data by far. I have no data to back this up, but I would guess the teams above the line of best fit would be the teams with great backfields. The bottom line is that I don't think I've solved the problem of not knowing how valuable the offensive line positions are.
You may be asking, "Zodiac you just wasted an entire evening combing through data that you couldn't even use to fulfill your stated goal. Aren't you upset?" Of course not! Well a little. A lot, actually. BUT! I did learn a few things by doing this whole project, like how different teams used their players against different opponents and in different game situations (and sometimes how the sim made terrible, awful decisions in those situations). I got to take a closer look at every game that happened in the S45 regular season. I'm also hoping that having all of this data might be useful to somebody else wanting to look into something using individual rushing attempts (I'll put the link to the spreadsheet at the end of the article). In short, yeah I'm kind of upset my theory didn't hold water, but I don't regret the time I spent working on this...
...or maybe that's just the sunk cost fallacy working its magic. Either way, thanks for reading!
Link to the rushing yards spreadsheet, make a copy if you're interested in messing with the data :)