*Sim-Batcher: A Python-based tool for large-scale sim testing

*Sim-Batcher: A Python-based tool for large-scale sim testing - Printable Version

+- International Simulation Football League (https://forums.sim-football.com)
+-- Forum: Community (https://forums.sim-football.com/forumdisplay.php?fid=5)
+--- Forum: Media (https://forums.sim-football.com/forumdisplay.php?fid=37)
+---- Forum: Graded Articles (https://forums.sim-football.com/forumdisplay.php?fid=38)
+---- Thread: *Sim-Batcher: A Python-based tool for large-scale sim testing (/showthread.php?tid=25750)

Pages: 1 2 3

*Sim-Batcher: A Python-based tool for large-scale sim testing - slate - 09-17-2020

[2,744 words by my count. Ready for grading. Please give 5% of the payout each to @r0tzbua and @katarn22 for help with debugging, please!]

For those of you who don't like reading, just click here and try the "Quick Setup Guide": https://github.com/slate15/sim-batcher. If you do that, please don't message me right away if something doesn't work though. Wink

Hi everyone. For those of you who don't know me, I am a S25 rookie CB for the Kansas City Coyotes. After being drafted to the Coyotes and getting some help setting up from experienced users in the locker room to start sim testing, I quickly recognized two opportunities to improve our sim testing process: increasing the sample sizes we used, and automating more beyond just pressing the "Sim Games" button and exporting the files. After developing and implementing a tool to address both of these throughout the season, I am proud to present sim-batcher, which I will describe in this article and the open source code for which is available at the GitHub link provided. Using this tool, I was able to run upwards of 150,000 tests overnight while away from my computer over ~10 hours, which greatly increased the number of tests we could run.

The structure of this article will be as follows:

1. Introduction and Motivation
2. How to Use
3. Possible Extensions
4. Conclusion

Please forward this to any ISFL GMs you might know. Big Grin

Introduction and Motivation

As I said previously, I had identified two ways to improve our sim testing procedure: increase sample size, and automate more. Luckily these two goals would turn out to be quite aligned.

To start off with, let me discuss sample size quickly. It's well known that the sim has some kind of memory leak issue or something, so it can only handle a certain number of simmed games before it crashes. For me, that is around 750 games. With 750 simmed games where each game is either a win or a loss, when we look at the win percentage of that sample of games it's never going to be exactly the same as the true win percentage. The error associated with the win percentage, W in a sample of n games is given by:

Code:
sqrt(W * (1-W) / n)

For 750 games and assuming a moderate scenario of W=0.7 (70%), we get a standard error of 0.0167, or just about 1.7%. Because about 90% of outcomes fall within 1.645 standard deviations of the mean, this means the true win percentage has a 90% chance of falling anywhere between 67.25% and 72.75%. Given that strategies can often be only a couple of percentage points different, this is a very large range of possibilities and can often lead to incorrect decisions.

If we instead increase the sample size to something like 3000 games, we instead get a standard error of 0.83% and a 90% confidence interval of (68.6%, 71.4%). Still somewhat wide, but definitely an improvement. At 6000 games the 90% confidence interval is less than 1 percentage point on either side.

So clearly if we could find a way to increase the number of games tested, we can do a much better job at identifying which strategies are better than others. But with the memory leak issue, this is quite difficult. However, after exporting the sim files and reloading the league file, more sims can be run without the game crashing.

Taking what was, as far as I'm aware, the best publicly available sim testing automation code provided by Maglubiyet here as a starting point, this first change wasn't too difficult to make.

The second issue of automating more would prove to be more difficult. My first thought was to work with the strategy implementation (playbooks, run/pass ratios, and blitz ratios). That menu is a bunch of dropdowns and text fields that you can tab between, so it lent itself well to autohotkey scripting. There were minor difficulties with figuring out how many tabs were needed to access which field, but it didn't take me too long to have a working script that took a hard-coded playbook, implemented it in the game, and saved the settings. Figuring out how to define what that playbook would be dynamically was looking to be way more trouble, but then I found out someone has made an open source implementation of autohotkey in Python, found here.

Being able to interface between Python and autohotkey is what really kicked this project into high gear. I was able to define classes to represent Strategy objects and complex methods to handle automation for repeatedly implementing strategies, simulating games, exporting and storing output, and reloading the game. Over the course of the season I've been able to continue improving the code by adding recovery fallbacks in case the game crashes, better ways to aggregate the Games.csv output, etc.

How to Use

(The following is a copy-paste of the README on GitHub. You can skip this, I just want the $$$ since I wrote all of that already.)

Quick Setup Guide
Requires:

* Python 3
* DDSPF 2016
* AutoHotKey

1. Download this code as a .ZIP file and extract it to a directory
2. Open the Command Prompt (cmd.exe) as administrator and navigate to that directory ("cd <directory>")
3. Run pip install -r requirements.txt --user
4. Open up the settings.py file in the text editor / IDE of your choice and change the filepaths as needed
5. Open DDSPF and load the league you want to use (DSFL only at present)
6. Launch utils/MousePosWatch.ank.ahk script, press Ctrl-J to start it, and use it to identify the mouse coordinates needed for settings.py (Press Esc to exit the MousePosWatch script)
7. Open the Configuration menu in DDSPF and count how many tabs it takes you to reach the "Enable Personalities" box. Put this number in ENABLE_PERSONALITIES_TABS in settings.py.
8. Press "Alt-F" to open the File menu dropdown, then count how many times you need to press the "Left" button to get to the "Export" menu dropdown. Put this number in LEFT_PRESSES_TO_EXPORT in settings.py.
9. In the Command Prompt, run python batchtest.py input/exampleStrats.py --home KCC --away MIN -N 50 (or you can use any other team codes you want for home/away)
10. Message me on Discord if you have any issues to this point. Otherwise start defining your own strategies and running your own sim tests!

Introduction

sim-batcher is a Python 3 package that allows for large-scale batch testing of multiple strategies in DDSPF 2016. The general way that it works is through a combination of the Python autohotkey package and custom Python code that allows for representations of strategies as Python objects and interfaces with the simulator.

To run a batch of tests, the main command that is used is:

Code:
python batchtest.py [STRATEGY_FILEPATH] -H [HOME_TEAM_CODE] -A [AWAY_TEAM_CODE] -N [NUM_TESTS] -O [OUTPUT_FILEPATH]

What this does is:

1. Read in the Strategies defined at STRATEGY_FILEPATH and convert them into Python objects
2. Simulate a number of games equal to NUM_TESTS between the specified teams
3. Record the data to OUTPUT_FILEPATH (this is done as tests are ongoing, so if the sim tests are interrupted for some reason the output for the completed tests will still be saved there)
4. Displays summary statistics for each strategy used upon completion of all the tests

Note: The -N and -O arguments are optional. The default number of tests per strategy is 500 and the default output filepath is defined in settings.py (starts as output/Results.csv).

Configuration

This code is written for Python 3 and requires a copy of DDSPF 2016 to run (so it can simulate the games). This code requires the Python packages ahk, numpy, and pandas. You can install these by running the following in the base directory:

Code:
pip install -r requirements.txt --user

Settings

The settings.py file contains most of the parameters that should remain the same across runs. These include:

1. GAME_APPLICATION_PATH, the filepath of the DDSPF .exe application file.
2. GAME_OUTPUT_FILE, the filepath of the Games.csv file for the league. IMPORTANT: The league must be listed first alphabetically in the game's dropdown menu for it to be reloaded upon hitting the maximum games simulated for one batch of tests.
3. RESULTS_FILE, the target .csv filepath to save the game output to. (Can optionally be specified here, but is overridden by the -O flag in the command line execution)
4. MAX_ITERS, an upper bound on the number of games that can be run on one league load without the game crashing. If the number of sim runs (-N flag) is specified to exceed MAX_ITERS, the code has methods to run a number smaller than MAX_ITERS multiple times to reach the total target number of simulated games for each strategy.
5. ENABLE_PERSONALITIES_TABS, the number of tabs needed to reach the "Enable Personalities" checkbox with the Configuration menu open. (Apparently this number can vary depending on whether the FOCUS_COORDINATES clicks on the "League Settings" tab or not, usually either 16 or 17.)
6. LEFT_PRESSES_TO_EXPORT, the number of presses of the "Left" button it takes to reach the Export dropdown menu from the File dropdown menu. This number also varies and we have no idea why. The possible known values so far are 3 and 7.
7. Some coordinates for specific buttons in the game window. Since not everything can be accessed via keyboard shortcuts, several mouse clicks are needed throughout the simulation loop.

These coordinates may change depending on screen size, resolution, etc. The provided autohotkey script "MousePosWatch.ank.ahk" in the utils/ folder can be used to find the x- and y- values for each button for your specific setup. With AutoHotkey installed, simply launch the script by double-clicking the file, press Ctrl-J to start the mouse tooltip, record the coordinates specified in the comments, and press Esc to exit the script once you're done.

For the FOCUS_COORDINATES, it seems to work most consistently if you ensure that the coordinates are above the "League Settings" tab shown below on the Configuration screen. From here the default 16 tabs to reach Enable Personalities should work consistently.

[Image: focusCoordinatesTarget.png]

8. Delay settings. Sometimes the DDSPF window is doing some stuff and needs time. These settings specify how long the code waits for those tasks to finish.

Strategies

An example strategy definition can be found in input/exampleStrats.py . The input is formatted as a list of Python dictionaries, where each dictionary contains a home strategy and/or an away strategy, plus a title for that test. The strategy is itself a Python dictionary meant to resemble how they are formatted in the game:

Code:
{

 DOWN_AND_DISTANCE: (OFFENSIVE_PLAYBOOK, RUN_PASS_RATIO, DEFENSIVE_PLAYBOOK, BLITZ_RATIO),  

 ...  

}

The settings for the run/pass ratio and blitz/ratio should be integers. The rest can use pre-defined constants that are found in backend/strategyConstants.py. Hopefully they should be fairly intuitive to use (FIRST_AND_TEN, VERTICAL_PLAYBOOK, THREE_THREE_FIVE_PLAYBOOK, etc.)

Either home strategies, away strategies, or both can be defined for all tests. If you, for example, define only a home strategy for the 1st element of the list and only an away strategy for the 2nd element of the list, things will go badly. If either home or away strategies are not defined, then whatever is currently in the league file will be used.

Data

The output saved to OUTPUT_FILENAME (default output/Results.csv) is basically a concatenation of all of the Games.csv files created by DDSPF during the runs. This file is saved to after each batch of sim tests, so if the program should terminate in the middle of a run the data that had been completed up to that point won't be lost.

Team Codes

Currently only the DSFL team codes are implemented in the code. They are:

DAL - Dallas Birddogs
KCC - Kansas City Coyotes
LDN - London Royals
MBB - Myrtle Beach Buccaneers
MIN - Minnesota Grey Ducks
NOR - Norfolk Seawolves
POR - Portland Pythons
TIJ - Tijuana Luchadores

Miscellaneous

Once again, the league that you use must be first alphabetically in the dropdown menu.

If you enter a number of iterations to run for each strategy greater than MAX_ITERS (using the -N flag), the code will split that large number into multiple small batches running the same sim, which in total will sum to the target number of iterations. In this case, the league file will be saved and overwritten after each strategy change, so it's advisable to save a copy of the league file as a backup (and name it something alphabetically later so it appears underneath it in the dropdown).

There are built-in failsafes for a few failure modes I've found so far which will allow the program to recover, reload the game, and run more tests if the game should crash for some reason.

Possible Extensions

Going forward, I see a lot of ways in which this can be improved to be even more powerful. There are also a few improvements that I have already implemented but that I'm not open sourcing with the rest of the code. (I can't let all of my secrets out, can I?)

To start with the things that I've already done, the main way I've used this framework so far this season has been to brute force enumerate almost all possible strategies and observe how well each one performs. For example, I will start with the Goal Line strategy (the theory being that Goal Line has the least effect on the other downs) and test all possible offensive strategies at 30, 40, 50, 60, and 70 run/pass ratios. We can then look at the resulting win percentage for each strategy, as well as other performance metrics like points scored and point differential, and then choose the best strategy from amongst them. With the Goal Line strategy decided, we can then move onto 3rd down, then 2nd down, then 1st down, doing the same thing each time.

Below is an example of the graphical output we could generate using this method. Using this, we identified several potential good strategies that deserved further testing, including the surprising Vertical 30 at the goal line. While it can sometimes be messy due to the inherent variance of testing, even with upwards of 2000 tests per strategy as done here, using multiple performance metrics is helpful to understand what's going on.

[Image: Ka7YORe.png]

From here, I can see the rough outline of an optimization framework, where strategies are tested with an increasing number of iterations if they perform well, but discarded if they seem to be performing worse than current strategies. This could reduce the number of iterations needed to the point where multiple passes can be made through each down/distance strategy to identify the best. But I haven't yet had the chance to work on that.

Another thing that would definitely be worthwhile to add is individual player stat collation. Currently the output is a collection of all of the Games.csv output from all the games simulated. Including the PlayerGames.csv output would be useful to gain further understanding of what is happening for different strategies, and help if win% is not the only thing that matters (i.e. chasing an individual award for an eliminated team).

I believe this tool could also be really useful for understanding more about the underlying way the game works. This tool should make it easier to run really large sample size tests to validate hypotheses and methodically test things like which defensive playbooks are better against which offensive playbooks, whether maximum blitz rate is always correct, or, in conjunction with the next point, how to optimally allocate TPE.

Finally, additional strategy parameters could be included. Right now only the playbooks and ratios are definable in the input files, so things like Tempo, Primary Receiver, and Depth Charts still have to be set manually. With autohotkey and keyboard shortcuts I believe it should be possible to include scripting to automate those parameters as well, which could greatly increase the power of this tool.

Conclusion

Congratulations and thank you so much for reading this far! I hope this tool is useful to everyone and that people can continue to iterate and improve upon it. While I understand there are always competitive advantages to be gained from developing these sorts of things on your own, I would hope that teams and people in the league can find a balance between winning and contributing to collaborative open source projects like this one. If you do continue to develop this tool and would be alright with sacrificing whatever 0.5% improvement you get from keeping that development secretive, I would love for people to submit pull requests so we can continue building this together.

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - DeadlyPlayer - 09-17-2020

delete this now

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - slate - 09-17-2020

(09-17-2020, 10:21 PM)DeadlyPlayer Wrote: delete this now

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - Beefstu409 - 09-17-2020

Second round here I come!

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - r0tzbua - 09-17-2020

It's amazing! Draft this man!

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - katarn22 - 09-17-2020

Extremely nice tool, especially the ability to define and tests multiple strategies within one iteration of the code. Fantastic work on this slate. Sharing it to all of the league is commendable as well.

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - GlimsTC - 09-17-2020

Hi, Coyotes GM here, if you plan on drafting any player, make sure that man is named "Slate" and his player is "Peter Patterson." Slate's been an absolute amazing person to have in the war room, and a god with the sim. If you get him, you're getting an immediate and crucial upgrade to your ISFL team.

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - Beefstu409 - 09-17-2020

What is the size of the output file that's being saved after you leave it running for a long time (like overnight)

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - .simo - 09-17-2020

What Glims said.

RE: Sim-Batcher: A Python-based tool for large-scale sim testing - .Laser - 09-17-2020

jesus