Skip to content

Forecast NYC taxi activity with deep learning. We compare the performances of models based on MLPs, RNNs, LSTMs, GNNs, and ARIMAX. Additionally, our code provides users with an easy-to-use pipeline for producing custom time series datasets of taxi activity from publicly available NYC TLC data.

License

Notifications You must be signed in to change notification settings

edwarddramirez/taxi-forecast

Repository files navigation

Binder License: MIT Python Repo Size

taxi-forecast

Introduction

Knowing where to go to find customers is the most important question for taxi drivers and ride hailing networks. If demand for taxis can be reliably predicted in real-time, taxi companies can dispatch drivers in a timely manner and drivers can optimize their route decision to maximize their earnings in a given day. Consequently, customers will likely receive more reliable service with shorter wait time. This project aims to use rich trip-level data from the NYC Taxi and Limousine Commission to construct time-series taxi rides data for 63 taxi zones in Manhattan and forecast demand for rides. We will explore deep learning models for time series, including Multilayer Perceptrons, LSTM, Temporal Graph-based Neural Networks, and compare them with a baseline statistical model ARIMAX.

Installation

Base Environment

Run the environment.yml file by running the following command on the main repo directory:

conda env create

The installation works for conda==22.9.0. This will install all packages needed to run the data processing code and ARIMAX fitting notebooks with jupyter or Binder.

GPU Environments

The model training notebooks were built using Google Colaboratory. The MLP, RNN, and LSTM models are built using pytorch=2.3.1 (i.e., the most updated version of pytorch on Google Colaboratory when we started this project). Therefore, the notebooks training these models should work out-of-the-box if you open them on Colab.

On the other hand, our graph neural networks were built using the torch-geometric-temporal package. This package takes a long time to install and requires some patching due to incompatibility with our version of pytorch. We show how to install a permanent environment in Google Drive in this Colab Notebook. To install the package without a permanent environment, see this Colab Notebook (not recommended).

Notebooks

  1. 00_a_data_summary.ipynb: Summary of dataset and processing Open In Colab
  1. 00_b_basic_ts_model.ipynb: Fitting a basic statistical time series model ARIMAX to test data Open In Colab
  1. 01_a_final_dataset.ipynb: Notebook generating the dataset we use to train/validate our models (with an 80-20 train-test split) Open In Colab
  1. 01_b_arimax.ipynb: Notebook training the ARIMAX model on the data Open In Colab
  1. 02_MLP_for_taxi_dropoff_time_series.ipynb: Notebook training MLP model to the data Open In Colab
  1. 03_a_rnn_lstm_single_series.ipynb: Notebook training an LSTM model to each taxi zone's time series separately Open In Colab
  1. 03_b_rnn_lstm_multi_series.ipynb: Notebook training an LSTM model to all the taxi zones simultaneosly Open In Colab
  1. 03_c_rnn_lstm_multi_series_multivar.ipynb: Notebook training an LSTM model to all the taxi zones simultaneosly and using additional features from the taxi data. Open In Colab
  1. 03_d_rnn_lstm_validation.ipynb: Contains classes for systematically training and validating baseline, RNN, and LSTM models for final results. Also sets up model that uses month, hours, and day of week embedding layers. Open In Colab
  1. 04_gnn_fits.ipynb: Notebook training a graphical model on the data Open In Colab

Directory Structure

  • assets: Additional assets unrelated to taxi data
  • data: Taxi data directory
  • data_processing: Notebooks for processing the taxi data
  • notebooks: Notebook files summarizing the data, performing fits, and generating main results
  • utils: Custom modules or files
  • scratch: For unclean files used to develop code

Each directory contains an individual README.md file with more details of directory contents.

Contributors

About

Forecast NYC taxi activity with deep learning. We compare the performances of models based on MLPs, RNNs, LSTMs, GNNs, and ARIMAX. Additionally, our code provides users with an easy-to-use pipeline for producing custom time series datasets of taxi activity from publicly available NYC TLC data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published