*Using a Dense Neural Network to Predict Wins Based on QB Stats - Printable Version +- International Simulation Football League (https://forums.sim-football.com) +-- Forum: Community (https://forums.sim-football.com/forumdisplay.php?fid=5) +--- Forum: Media (https://forums.sim-football.com/forumdisplay.php?fid=37) +---- Forum: Graded Statistical Analysis (https://forums.sim-football.com/forumdisplay.php?fid=153) +---- Thread: *Using a Dense Neural Network to Predict Wins Based on QB Stats (/showthread.php?tid=26779) |
*Using a Dense Neural Network to Predict Wins Based on QB Stats - wonderful_art - 10-14-2020 Using a Dense Neural Network to Predict Wins Based on QB Stats Hey there, its wonderful_art and I’m presenting today another data science project for all those nerds like me in the ISFL who love this type of stuff. The focus of this article (and series most likely moving forward) is deploying a machine learning model, specifically a perceptron dense neural network, onto ISFL stats to gleam further insights on how those stats reflect reality – and what they say about the player overall. This particular project is the start of a longer series on predicting wins. I’m hoping to build larger more complex models to follow in this initial footstep, but I’m being agile and iterating through MVPs as I go. So, this is just the beginning, in other words, of some fun stuff I’m hoping to do with stats and the league resources. Methodology The goal is to generate QB stats reliably from web-scraping indexes, normalize the statistics and feed them into a dense neural network in the hope of building a sensitive model to accurately predict wins based solely off the QB statistics. Through the use of a Google Colab notebook, I’ve been successful, building a model with 0.8852 root mean squared error – essentially only off by 0.9 wins when predicting on average. Pretty good, I’d say – although there’s certainly room for improvement. You can check out the Google Colab notebook here. What’s included? Essentially the model looks at every quarterback’s stats since S5 through S24, specifically at completions, attempts, yards, average completion pct, longest throw, touchdowns, and interceptions. Each one is normalized for that season’s passing stats, meaning each season is unique for their given field – and it allows us to predict a quarterback’s final season wins before the end of the season (a model based on actual numbers, not normalized, wouldn’t have this ability). What does the dense neural network look like? It takes in a series of features (all the stats listed above) and feeds them through 3 dense perceptron layers – one with 64 hidden layers, the second with 32 hidden layers, and the final layer is just a signal firing layer that produces a prediction for each value in the data – hopefully the correct wins for that quarterback in that season. Each epoch, the model learns what weights to assign its nodes and connections to provide the correct final output from the last layer. In total, there are 2,625 parameters the model can adjust to get the correct final number – the correct number of wins for that quarterback’s stats. Here’s a plot of the model: It runs for 100 epochs, continually optimizing based on a loss function called mse – mean squared error. It takes the sum of squared differences between our target variable and the predicted value – the target being the wins that season for a QB, and the predicted value the result of the model’s prediction. This loss function works well to emphasize drastic errors, which due to being squared are the focus of optimization. It makes continual adjustments to these parameters to optimize this mean squared error. For communicating the error, its much more valuable to take the root – which then gives us a number at scale to the values to understand what error we are seeing. In this case, roughly 0.79 wins account for error. It initially started about 7 wins, and over 100 epochs for each of the 19 seasons (~1900 total epochs) it reduced to 0.79 error on wins predicted vs actual. Here’s the model summary: Code: Model: "sequential_56" S25 Quarterback Predicted End of Season Wins Now that I’ve outlined the model and its training, here’s the results for the current S25 group. Don’t forget wins are predicted off current stats for the entirety of the season. Code: Predicted Wins Team Name How does this compare to the eye test? Well there’s some interesting predictions here based solely off of QB stats – awarding Jack the highest predicted wins out of any QB, by a fairly decent margin too, is not well explained by the data available at first glance. What follows is my explanation for the model, but I should not this is effectively a black box – the model relies on those ~2,500 parameters finely tuned to this task to predict results, and so my explanation may be flawed. So while award Jack 11 predicted wins? Not the highest yards, attempts, completions, pct, or touchdowns. The key is that model highly values quarterbacks who have a strong all around game - and it doesn't care how often they throw. Jack's put up the second highest touchdowns, second lowest interceptions, and second best completion percentage. But then why is Pheonix so low and not second (or first for that matter) with a league leading 10 touchdowns? While the model seems to value touchdowns (obviously), it really loves quarterbacks who are efficient in the passing game and have a high completion percentage. So having a sub 55% completion rate seems to affect the model's predicted output based on league history. Why is Fujiwara not higher? Despite the most yards, the model seems to disadvantage quarterbacks who are heavily used and relied on for an air raid style of play. Fewer attempts at bat most likely means fewer opportunities for a devasting pick. The model predicts that quarterbacks who are used not too much and not too little, and who perform well when they’re called with efficient play and not too many interceptions, will provide their team more wins than a stud QB who’s always called on to dish out the ball downfield. Between heavy usage and high interceptions, Fujiwara is not ranked highly by the model's predicted wins. Let's look into some data analytics and see how Predicted Wins correlates to the major statistics for each quarterback. Note that Passer Rating was not included in the model, but it has a high correlation with the output - meaning the sim likely uses a linear regression model that resembles this dense neural network final model for deciding passer rating. And that QBR is a decent metric for evaluating quarterbacks via the index. S24 Quarterback Predicted Wins vs Actual Wins We can also check how my explanations line up with last season’s S24 quarterback results: Code: Predicted Wins Team Actual Wins Name This result appears to line up with my explanation – McDummy had few interceptions, high efficiency, and not a crazy number of attempts. Whereas the model viewed Banks as used far too much with too much risk inherent in that type of play – high yards, high TDs, but also high interceptions. The model explains that while it worked for Banks last season, across its 14 seasons of passing data, more often than not, that style costs wins. Expected Wins vs Actual Wins (WinContribution) for Historic ISFL Quarterbacks Using the model, however, we can also investigate historic data on who contributed the most and least to their team's victories - at least from the model's perspective. This is essentially the equivalent to asking who was the most underrated quarterback in the league - who otherwise put up good stats to the model but didn't receive the wins that would be expected and predicted given those statistics. Here's the results: Code: UniqName WinContribution_Sum WinContribution_Mean There are two stats above: the WinContribution_Sum derived from the sum of all season's differential between the model's predicted wins and the actual wins for that quarterback; and the WinContribution_Mean which is the average for that quarterback's career. Note that its just pure addition behind the WinContribution metric, meaning that players with longer careers and many seasons of over-performing compared to team win totals would quickly gather higher numbers over the course of their careers. So who's the most underrated according to the model? There's a number of candidates but it settled with Austin Copperhead's Easton Cole with just over 22 predicted wins over his career actual win total (by the model's estimate he should have roughly 69 career wins versus his actual 47). This was especially emphasized in S16 as a rookie when the Copperheads won 0 games, but the model predicted 5, and in S18 when the Copperheads won a respectable 5 games, but Cole's play was valued at 11 wins. Despite the team around him catching up to Cole's play, he's consistently performed at a rate higher than his total wins would reflect. What about the most underrated season? Here are the outliers there: Top 5 Highest and Lowest WinContribution QB seasons: Code: Name Team Season Actual Wins Predicted Wins WinContribution Code: Name Team Season Actual Wins Predicted Wins WinContribution You'll note that the lowest 5 WinContribution QBs are all from successful seasons on teams, when the team had success on the field. The model generally under values these quarterbacks, likely predicting closer to a mean predicted wins around 9 which likely increases its odds of success. Team Analysis And finally, we can look at how each team's quarterbacks have faired via a swarm plot: -
What can we take away? For a team that has been historically very good, like the Otters, most of their quarterback seasons have fallen right around 0 WinContribution - most likely due to the fact that the model predicted them to have success - and they did. The WinContribution metric is better for identifying teams that have had a wide distribution of quarterback performance relative to their final standing - San Jose's QBs have ranged the entire distribution.
Clearly the model doesn’t have a full picture and a better model to predict wins would need to capture more than just the quarterback’s passing stats. Future projects on this space I’m planning includes incorporating more player statistics in a hope to provide a more fulsome team picture, and potentially run fun hypotheticals on team changes. Otherwise I’d like get more timeseries data and include a more predictive distribution based on player modelling and TPE – the latter isn’t even included in this model and likely would play a key role in any predictive analytics for the league. Stay tuned for more updates and work on this angle as I continue diving into it. Quote:2093 words and research and data work RE: Using a Dense Neural Network to Predict Wins Based on QB Stats - Starboy - 10-14-2020 Even neural networks hate Chika! Fake News! This is amazing work though. Love it and all the work put into it RE: Using a Dense Neural Network to Predict Wins Based on QB Stats - White Cornerback - 10-14-2020 fantastic work |