NYC Taxi Demand Predictor

A machine learning solution for predicting taxi demand in New York City, designed to optimize fleet operations by matching driver supply with passenger demand.

Overview

This project implements an end-to-end machine learning pipeline to forecast taxi demand patterns. By analyzing historical trip data, the system helps operations teams maintain optimal fleet utilization, reducing both driver idle time and passenger wait times.

Key Features

Automated data ingestion from NYC TLC Trip Record Data
Time series data transformation and feature engineering
Multiple ML models: Baseline, XGBoost, and LightGBM
Comprehensive data validation and preprocessing pipeline
Interactive visualization and exploratory data analysis

Project Structure

taxi_demand_predictor/
├── data/
│   ├── raw/                    # Raw parquet files from NYC TLC
│   └── transformed/            # Processed time series data
├── notebooks/                  # Jupyter notebooks for analysis
│   ├── 01_load_and_validate_raw_data.ipynb
│   ├── 02_transform_raw_data_into_ts_data.ipynb
│   ├── 03_transform_ts_data_into_features_and_target.ipynb
│   ├── 04_transform_raw_data_into_features_and_targets.ipynb
│   ├── 05_visualize_training_data.ipynb
│   ├── 06_baseline_model.ipynb
│   ├── 07_xgboost_model.ipynb
│   ├── 08_lightgbm_model.ipynb
│   └── 09_lightgbm_model_with_feature_engineering.ipynb
├── src/                        # Source code modules
│   ├── data.py                 # Data loading and validation
│   ├── data_split.py           # Train/test splitting utilities
│   ├── paths.py                # Path configurations
│   └── plot.py                 # Visualization utilities
├── pyproject.toml              # Poetry dependencies
└── README.md

Installation

Prerequisites

Python 3.10 or higher
Poetry (for dependency management)

Setup

Clone the repository:

git clone https://github.com/Hichemchir/taxi_demand_predictor.git
cd taxi_demand_predictor

Install dependencies using Poetry:

poetry install

Activate the virtual environment:

poetry shell

Usage

The project follows a sequential workflow through Jupyter notebooks:

Data Loading: Load and validate raw data from NYC TLC
Time Series Transformation: Convert raw data into time series format
Feature Engineering: Generate features and targets for ML models
Visualization: Explore and visualize training data patterns
Modeling: Train and evaluate baseline, XGBoost, and LightGBM models

To run the notebooks:

jupyter notebook

Navigate to the notebooks/ directory and execute them in numerical order.

Data Source

Historical taxi trip data is sourced from the NYC Taxi and Limousine Commission (TLC).

Models

The project implements and compares three modeling approaches:

Baseline Model: Simple statistical baseline for benchmarking
XGBoost: Gradient boosting with optimized hyperparameters
LightGBM: Efficient gradient boosting with feature engineering

Dependencies

Core dependencies include:

pandas & numpy: Data manipulation
scikit-learn: ML utilities
xgboost & lightgbm: Gradient boosting models
plotly: Interactive visualizations
optuna: Hyperparameter optimization

See pyproject.toml for the complete dependency list.

License

This project is available for educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC Taxi Demand Predictor

Overview

Key Features

Project Structure

Installation

Prerequisites

Setup

Usage

Data Source

Models

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
poetry		poetry
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Hichemchir/taxi_demand_predictor

Folders and files

Latest commit

History

Repository files navigation

NYC Taxi Demand Predictor

Overview

Key Features

Project Structure

Installation

Prerequisites

Setup

Usage

Data Source

Models

Dependencies

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages