match2kd
aims at overcoming "Callof" API rate limits by estimating a match difficulty ("lobby k/d") from its features (players' metrics in that match).
Instead of getting every player matches history (hundreds of calls) to calculate the score, a model predicts the difficulty with a single prior API call (a given match metrics).
The model is one of the 3 component of a more global personal side project centered about COD API / metrics intelligence:
- wzkd : a Streamlit-based dashboard that collect, aggr. and visualize player's stats from Call of Duty Warzone (1), where we also deploy the model.
- wzlight (also on pypi) : a light, asynchronous python wrapper for 'Callof' API that was (also) used to build our dataset / power the dashboard.
Edit - June 2023. While our approach is still valid, Activision discontinuated its API with the launch of Warzone 2 and apparently has no plan to bring it back.
- Calculation of an accurate "lobby kd" (players' average kills/deaths ratio), the main metric used by players to estimate a match difficulty, isn't possible without some form of special access / partnership with Activision like cod tracker / wzranked stats tracker websites and/or a continuous matches / players profile data retrieval, caching & storage.
- One of the rate limit of the COD API (not documented) is said to be 200 calls / 30 mn or so.
- Our target, "Lobby KD" or more accurately "avg players' kills/deaths ratio" is calculated as :
Mean (n players' kills / deaths ratio, their x last matches)
- This metric is not directly provided by the API and is only retrievable by querying and aggregating
n
(player) timesx
(player recent matches) the API. - E.g. for a single match of 40 players with, let's say an history of 30 matches per player, we would need a minima 40 (players) * 30 (matches) = 1200 calls to calculate that metric.
- As a player we can often "feel" what is the difficulty of a match and the API provides post-match metrics (one call with a given match ID), with players performance metrics this match, that could be used to model game difficulty without querying all players profiles and performance in their last recent matches.
- Two modules were used to retrieve data:
wzranked.py
: custom Cls / wrapper to crawl/parse a --quite accurate, "lobby kd" from wzranked.com (unofficial, graphQL-based) API.collect_matches_details.py
: an async wrapper to collect detailed match stats, based on my side projectwzlight
Final custom dataset (match2kd/dataset_warzone_kd_bigger.parquet.gzip
) with 55k rows (1170 unique matches with +- 40 players), 152 features (kills, deaths, pct time moving...) and the associated target (game difficulty / "lobby kd"). The data was collected around Sept 2022, from random COD Warzone players/matches (Rebirth mode only, solos to quads)
- Final XGB model available (
match2kd/xgb_model_lobby_kd_2.json
), currently deployed in Streamlit app - Notebook with the workflow to train our model (
match2kd/model_v3.ipynb
) - Include my findings to improve model accuracy : features selection & engineering, as well as final aggregations made to retain enough information.
- See notebook for more details
- Features selection : 20+ retained out of 150
- Features creation :
At dataset level (all rows) - binning on match time : morning (1), noon (2), afternoon (3), evening (4), late evening (5)
- normalization by time played (x/time) of players' kills, deaths, damage done, damage taken
- normalization by kills (x/kills) : players' damage and headshot by kills
At match level, aggregation of players' stats per match - mean, std, median for a fixed set of features
- pct_players with 0,5, 10 kills
- pct_players with kills streak, double kills, with headshots
- Custom dataset : the usual tedious work (scrapper, API wrapper) to gather our precious data.
- Skewed continuous target + 4 "types" of matches (team of 1,2,3 or 4 players) even if XGB is allegedly ok with that.
- multi-level data : we want to predict "lobby kd" for a match. Picture every match as a table of +- 40 rows (players) with 150+ features (players metrics) and a single unique target (lobby kd). To train our model we will need to compress our data (one row with 150+ features per match -> our target). Will the model retain enough information ?
- Ressources and examples quite scarce when dealing with multi-level data. Also specific models exist but won't extend on the matter :-p.
- Our model has a mean rmse of +- 0.1 ; FYI "lobby k/d" usually navigates between 0.6 (rare) and 1.5 (rare).
- From post production personal tests (far from exhaustive), captures quite well lobby kd variations, just using compressed players-in-a-match features.
- Overall feeling (would need further proofing) is that lowest and highest lobby kds could be more accurate. E.g True [.7 - .9] k/d or > 1.1 k/d matches seem less finely predicted that those revolving around 1.
- We would have wanted our RMSE to be below the 0.1 mark more often as a lot of lobby kd revolve around the "1" threshold and we can "feel" the difference between a "0.9 kd" and a 1 or 1.1 kd match.
- A lot of work was puth in back-and-forth tweaking/testing on features; some missing features (not provided/updated by COD api) would have been useful... or more engineering... ;)
- Playing around "ranking" models or the "rank" features (players/team placement in a match).
- Augment and/or add more balance our dataset due to the the inherent skewed nature of Warzone matchmaking systel (a lot of matches revolve around the 1 kd mark).
- Cf. new techniques around synthetic data generation with DL for tabular data.
- Benchmark vs. other models. As of nov 2022 and after some searches, didn't bother to test DL methods (tabnet & Co) for our tabular exercise despite and interesting (small) revival in the domain.
- Prediction power gain is difficult due to the nature of the available data / degree of randomness of players' performances in a battle royale match ?
- 4, 8 or 2 models (solos, duos, trios, quads Resurgence) instead of 1 ? Though, squad size and map type (Fortune Keep vs Rebirth Island) did not seem that important in our model though.