This repo serves as playground for working with F1 related data. In particular I focus on answering some interesting problems that benefit from the use of SQL and machine learning.
Please check out the Jupyter notebooks guess_track_from_telem and guess_driver_from_telem. In these notebooks I utlize keras and tensorflow to identify Formula 1 circuits and drivers based on speed v.s. time race telemetry.
At present the identification of tracks based on telemetry is extremely accurate. Speed traces are given to the model, which selects among 22 possible tracks to predict the correct source of the given lap. Below I show example laps which were processed by the model for the Austrian Grand Prix, with the second figure (below) showing similar data for the Japanese Grand Prix. The Japanese Grand Prix is the only track currently containing laps incorrectly identified, which is caused by a small subset of the laps which were likely incorrectly identified as green flag (normal) running, and likely should have been ignored as being behind a safety car. The regular accelerating and braking without achieving top speeds is indicative of a driver warming their tires and brakes, and should not be present in a normal racing lap. These incorrectly identified laps are shown in red below. Another interesting study that could be conducted with these traces would be to identify weather or track conditions based on telemetry.
Driver identification is much more difficult than track identification, as the subtle differences in driver styles is far smaller than the absolute differences in track layouts. Because of this the current achieved accuracy for drivers based on traces is only ~90%. Though this is not perfect, it is remarkable considering the fact that there are 20 drivers accounted for in the model, all driving at the top tier of their sport in relatively similar hardware. Certainly I would not be able to do this well by eye. Below is a plot showing some laps that were correctly identified as their driver (each driver is represented by a distinct color), compared to laps that were incorrectly identified. This plotted sample consists of 20 drivers, and shows how small differences are.
I have also been playing with the fastf1 tool to scrape other forms of F1 race data and store this data into SQL databases. Please take a look at f1_djsouthall/sql/make_tables.py to see the generation of these SQL databases. In f1_djsouthall/sql/example_sql_analysis.py I perform a few small analysis examples using those tables.
I also thought it would be fun to play with generating new tracks by randomly combined sectors from existing tracks. You can see some of that below:
The foundation of this work is the data I have extracted using the brilliant FastF1 API. It really is a great tool, so please check it out if interested!