International Simulation Football League
Predicting Running Backs' YPC with TPE - Printable Version

+- International Simulation Football League (https://forums.sim-football.com)
+-- Forum: Community (https://forums.sim-football.com/forumdisplay.php?fid=5)
+--- Forum: Media (https://forums.sim-football.com/forumdisplay.php?fid=37)
+---- Forum: Graded Articles (https://forums.sim-football.com/forumdisplay.php?fid=38)
+---- Thread: Predicting Running Backs' YPC with TPE (/showthread.php?tid=2547)



Predicting Running Backs' YPC with TPE - PigSnout - 07-24-2017

How much does a running back’s TPE affect his YPC? I was curious as the Yeti had the two highest TPE running backs in the NSFL last season, but had the second lowest yards per carry in the NSFL. I decided to graph each running back’s TPE after the final update of Season 1 against their Season 1 YPC to see if there was a relationship. (I only used running backs with 50 carries or more)
[Image: OS9PdKh.png]
If there was a strong relationship between YPC and TPE, we would expect a linear relationship (the points would create a straight, upward sloping line). There is a slight upward trend, but the shape of the points looks more like a blob than a line. This suggests there is a weak relationship between YPC and TPE. Luckily, we do not have to rely on the eye test. We can use a correlation coefficient which is a number that represents the linear dependence of the variables. Since the data is trending upward, we will get a number between 0 and 1 with 0 meaning there is no dependence between YPC and TPE and a 1 meaning YPC is completely dependent on TPE. It is very rare to get a perfect 0 or 1, but a higher value suggests a stronger relationship while a lower value suggests a weaker relationship. I calculated a correlation coefficient with this data and the result was r = 0.4909. Given that this is less than 0.5, it reaffirms my early thoughts that there is not a very strong correlation between a running back’s TPE and his YPC. So what else would impact a running back’s YPC? My first thought was the strength of his offensive line.

In order to determine the strength of each offensive line, I used TPE in order to remain consistent by using the same unit I used to evaluate running backs. I determined offensive line strength by taking the average of each team’s 5 best offensive linemen at the end of Season 1. Since bot players do not have TPE, I counted them as 40 TPE since they have the lowest attributes for their position so they are weaker than a human player with 50 TPE.
[Image: 41EKV0T.png]
I then graphed each player’s offensive line’s average TPE against that player’s YPC.
[Image: PmXvb9z.png]
This looks a little more linear but it still looks rather blob-like. I calculated a correlation coefficient for this data and got r = 0.5596. As it appeared to be, average offensive line TPE was a better predictor of YPC than the running back’s TPE was. However, 0.5596 still does not suggest a very strong relationship. The correlation coefficient would probably need to be over 0.7 before it could really be considered strong. This method is still flawed because it neglects running back talent so it implies that Bubba Nuck and Jack Stats are equally as talented because they are running being the same line. We need a way to assess both the strength of the offensive line and the talent of the running back. In order to do this, I came up with a new stat called Adjusted TPE.

Adjusted TPE takes a running back’s TPE and adjusts it based on the average TPE of his offensive line. If a running back is running behind a poor offensive line, his TPE is adjusted downward to reflect the fact that he will likely perform below his talent level. Likewise, a running back with a strong offensive line will have his TPE scaled upward in order to show that he will perform better. In order to get these values, I needed a multiplier for each team’s offensive line in order to scale it properly. To determine the multipliers I calculated the average of the average offensive line TPEs to get that the average value of theses averages was 88.3. To determine each multipier, I simply divided each team’s average offensive line TPE by 88.3 in order to get the value.
[Image: I726nPg.png]
After that, I multiplied each player’s TPE by their team’s multiplier to calculate their Adjusted TPE. I then graphed Adjusted TPE against their YPC to judge the relationship.
[Image: JpigOeJ.png]
This time there looks to be an actual linear pattern. The correlation coefficient between YPC and Adjusted TPE is r = 0.8028. This is much stronger than either of the previous relationships and this time there is actually a strong relationship between the two variables. Since there is a strong relationship between Adjusted TPE and YPC, I decided to see how well Adjusted TPE can predict YPC. To do this, I used the Adjusted TPE and YPC values from Season 1 and generated an equation for a regression line, which is the line closest to the actual pattern of the data. The equation was YPC = 2.95420164 + 0.00710051X where X is equal to the player’s Adjusted TPE. I used this equation to generate a predicted YPC for each player.
[Image: kPQU1zZ.png]
The predictions were not perfect, but they did a good job predicting major trends. There are certain aspects that make it impossible for Adjusted TPE to perfectly predict YPC. For example, all of the Yeti players performed below expectations. The Yeti picked up Brokk Lee on waivers late in the season, so he only played for a few games yet the prediction assumes he played for the entire season, which is why it overestimated the YPC of the Yeti players. The model cannot account for changes to during the season, so it will never be perfect. Also, the model does not account for how TPE is distributed, so certain offensive lines may invest more in pass blocking and might not be as good at run blocking as expected. However the average absolute value of the differences was only 0.228443, which is pretty good. On average, it was within 0.2 of the actual value. This is interesting, but it is not very useful to predict stats for a season that has already happened. The big question is whether this method can predict stats for Season 2. Let’s find out. The first step was to calculate the average TPE of each offensive line for Season 2.
[Image: 9FOPDPh.png]
The average of these averages was 108.8 so I used this value to calculate new multipliers.
[Image: 9FOPDPh.png]
Before plugging in the numbers, I needed to adjust the Adjusted TPE values. With increases to TPE, the Adjusted TPE values would naturally be higher than Season 1 values and the predictions would be too high. I obtained a multiplier for the adjusted TPE by dividing the average Season 1 Adjusted TPE by the average Season 2 Adjusted TPE. This is where I ran into an issue. The multiplier was 0.981690944, which only produced a miniscule change. Despite having more high values of Adjusted TPE, Season 2 also has more low values of Adjusted TPE due to newly joining rookies so the average values are very similar. Right now, I can’t really effectively adjust the Season 2 values. Here were the results I received:
[Image: qZihbDm.png]
(I apologize if I made any mistakes since I am not familiar with each team’s running back situation. I included the players from this article except I replaced Sabonis with Parry because Sabonis retired)
The obvious issue arises when looking at Leroy Jenkins’s predicted YPC. 5.6 yards per carry is a full yard more per carry than any running back managed to average in Season 1. That number does not sound reasonable or realistic. So what went wrong? The issue is extrapolation. 218.19 was the highest Adjusted TPE value used in the regression model used to predict the YPCs for Season 1. Jenkins’s Adjusted TPE value of 375.13 is far outside of the model’s range so the model cannot accurately predict the YPC for that value. Although the YPC values are not accurate, we can at least analyze these predictions for trends. For example, it is clear that Leroy Jenkins is the favorite to have the highest YPC in Season 2. Bubba Nuck and Reg Mackworthy are similarly expected to have high averages again this season. Omar Wright, who tied for the highest YPC in Season 1, could see a big drop off this year when he goes from running behind one of the league’s best lines to running behind one of the league’s worst lines. Also, this method may allow us to obtain more accurate predicted YPCs. With the preseason starting soon, I might try to come up with a new regression model using preseason stats to predict YPCs for the regular season. Although even that model will not be perfect. The data will be less meaningful since preseason is such a small sample size and teams will likely change their strategies up for the regular season so we may see changes. Also, they model will not be able to account for the fact that certain offensive lines will grow a lot during the season while other offensive lines are mostly inactive players. However, I still think it is fun to try to predict the future as much as possible. After the preseason, I can also look at each player’s carries per game in order to try to predict their season rushing total. Anyway, this isn’t anything too serious but I just decided to play around with numbers a little bit and try to make some predictions.

Quote:1552 Words
(This is technically a statistical article but I posted it as a regular article since the payout is slightly higher)


Graded



Predicting Running Backs' YPC with TPE - timeconsumer - 07-24-2017

7 decimal places?


Predicting Running Backs' YPC with TPE - Bzerkap - 07-24-2017

Nice information. Perhaps you should exclude pass blocking TPE from offensive line TPE. I think it would be safe as there is no way the pass blocking stat should affect the run game. Great article nonetheless


Predicting Running Backs' YPC with TPE - RavensFanFromOntario - 07-24-2017

Cobb is ostensibly mia