Knowing where to go to find customers is the most important question for taxi drivers and ride hailing networks. If demand for taxis can be reliably predicted in real-time, taxi companies can dispatch drivers in a timely manner and drivers can optimize their route decision to maximize their earnings in a given day. Consequently, customers will likely receive more reliable service with shorter wait time. This project aims to use rich trip-level data from the NYC Taxi and Limousine Commission to construct time-series taxi rides data for 63 taxi zones in Manhattan and forecast demand for rides. We will explore deep learning models for time series, including Multilayer Perceptrons, LSTM, Temporal Graph-based Neural Networks, and compare them with a baseline statistical model ARIMAX.
Run the environment.yml
file by running the following command on the main repo directory:
conda env create
The installation works for conda==22.9.0
. This will install all packages needed to run the data processing code and ARIMAX fitting notebooks with jupyter
or Binder
.
The model training notebooks were built using Google Colaboratory
. The MLP
, RNN
, and LSTM
models are built using pytorch=2.3.1
(i.e., the most updated version of pytorch
on Google Colaboratory when we started this project). Therefore, the notebooks training these models should work out-of-the-box if you open them on Colab.
On the other hand, our graph neural networks were built using the torch-geometric-temporal
package. This package takes a long time to install and requires some patching due to incompatibility with our version of pytorch
. We show how to install a permanent environment in Google Drive
in this Colab Notebook. To install the package without a permanent environment, see this Colab Notebook (not recommended).
- 00_a_data_summary.ipynb: Summary of dataset and processing
- 00_b_basic_ts_model.ipynb: Fitting a basic statistical time series model
ARIMAX
to test data
- 01_a_final_dataset.ipynb: Notebook generating the dataset we use to train/validate our models (with an 80-20 train-test split)
- 01_b_arimax.ipynb: Notebook training the
ARIMAX
model on the data
- 02_MLP_for_taxi_dropoff_time_series.ipynb: Notebook training MLP model to the data
- 03_a_rnn_lstm_single_series.ipynb: Notebook training an LSTM model to each taxi zone's time series separately
- 03_b_rnn_lstm_multi_series.ipynb: Notebook training an LSTM model to all the taxi zones simultaneosly
- 03_c_rnn_lstm_multi_series_multivar.ipynb: Notebook training an LSTM model to all the taxi zones simultaneosly and using additional features from the taxi data.
- 03_d_rnn_lstm_validation.ipynb: Contains classes for systematically training and validating baseline, RNN, and LSTM models for final results. Also sets up model that uses month, hours, and day of week embedding layers.
- 04_gnn_fits.ipynb: Notebook training a graphical model on the data
assets
: Additional assets unrelated to taxi datadata
: Taxi data directorydata_processing
: Notebooks for processing the taxi datanotebooks
: Notebook files summarizing the data, performing fits, and generating main resultsutils
: Custom modules or filesscratch
: For unclean files used to develop code
Each directory contains an individual README.md
file with more details of directory contents.