Skip to content

определить характеристики и с их помощью спрогнозировать длительность поездки такси

Notifications You must be signed in to change notification settings

exelero565/Project_5

Repository files navigation

DALL·E 2024-02-13 14 06 35

Project Title

"NYC Taxi Trip Duration Prediction Using Machine Learning"

Project Description

This project aims to accurately predict taxi trip durations in New York City using a variety of machine learning techniques. By analyzing a comprehensive dataset of taxi trips, we develop models that consider factors such as pickup and dropoff locations, trip distances, time of day, and traffic conditions. Our goal is to enhance ride-sharing efficiency and improve urban mobility planning.

Data Sources

  • NYC Taxi Trip Records: Detailed trip data including pickup and dropoff coordinates, trip distances, and durations.
  • OpenStreetMap (OSRM): Road network data used for calculating route distances and expected travel times.
  • NYC Weather Data: Historical weather information to examine its impact on trip durations.

Methodology

  1. Data Preprocessing: Cleaning, feature extraction, and normalization of taxi trip and external datasets.
  2. Feature Engineering: Creating new features like trip distance from coordinates, time of day, day of the week, and weather conditions.
  3. Exploratory Data Analysis (EDA): Analyzing the datasets to uncover patterns and relationships that inform our modeling strategy.
  4. Model Development: Training and evaluating several models, including Decision Trees, Random Forest, Gradient Boosting, and XGBoost.
  5. Model Tuning: Hyperparameter optimization to improve model performance.
  6. Evaluation: Using Root Mean Squared Logarithmic Error (RMSLE) to assess model accuracy.

Technologies Used

  • Python: Main programming language for data processing and modeling.
  • Pandas & NumPy: For data manipulation and numerical calculations.
  • Scikit-learn: For machine learning model implementation and evaluation.
  • XGBoost: For advanced gradient boosting model.
  • Matplotlib & Seaborn: For data visualization.

Results

Discussion of the best performing models and their practical implications for taxi companies and city transportation planning.

Installation

Instructions on setting up the project environment, including required libraries and how to run the scripts.

Usage

Examples of how to execute the modeling pipeline, from data preprocessing to making predictions.

Contributing

Guidelines for contributing to the project, including how to propose improvements and submit pull requests.

License

The project is distributed under the MIT license. You can freely use and distribute this code for personal and commercial purposes with a mandatory link to the author.

Acknowledgments

Credits to data providers, contributors, and any references used in the development of this project.

Author

Releases

No releases published

Packages

No packages published