Skip to content

Shuo-Wang-UCBerkeley/FlightDelay

Repository files navigation

Final Project: Flight Delay Prediction

UC Berkeley MIDS Program

Shuo Wang, A Adam Saleh, Liang Li, Qian Qiao

Summer 2023

Project Overview

Problem: US Flight delays ⇒ affecting consumers nationwide and impacting operational costs for airline and airports

Objective: Develop a predictive model that can anticipate flight delays (>15 mins).

Dataset

There are four datasets used for this project.

  • (1) The first dataset (Flights Data) is for flights originating and landing from/to US Domestic airports from 2015 to 2021.
  • (2) The second dataset (Weather Data) is for weather information from 2015 to 2021.
  • (3) The third dataset (Station Data) is for the weather stations, which contains location and proximity information to help identify the closest weather station to any airport in the first dataset.
  • (4) The fourth dataset (Airport Code Data) is for IATA airport code, a three-letter code that is used in passenger reservation, ticketing, and baggage-handling systems.

There are joined tables (OTPW) using data from all previously mentioned data tables (Airline, weather, sation & AITA). OTPW tables for different time periods are also provided. These consist of 3m, 6m, 1yr and the complete dataset (2015-2019). The total size of the three month OTPW dataset is 1,500,620,247 bytes, with 1,401,363 rows and 216 columns. The total size of the 2015-2019 OTPW dataset is 6,525,616,408 bytes, with 31,673,119 rows and 214 columns.

Data dictionary: https://www.transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ

EDA and Feature Engineering

alt text

Modeling Approach and Aspects

Model Pipeline

alt text

Cross-Fold Validation

alt text

Preprocess Stage

alt text

Evaluation Metrics

  • F2 - F Beta Score (Beta = 2.0)
    • Weighted F2
    • F2 - Label 1 (yes delay)
    • F2 - Label 0 (no delay)
  • F1 Score
  • Accuracy
  • Precision
  • Recall
  • ROC-AUC

Results and Discussion

Experiments

alt text

Model Performance

alt text

Feature Importance

alt text

Future Work

  1. Execution time
  2. Neural Network Implement

Helpful Information

Environment

Databrick Notebook

Back-To-Top

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published