Datasets featuring global, high-level flight schedules extracted from worldwide aircraft ADS-B position transmissions.
Published per quarter of a year, starting from 2024+ onwards. Covers all flights globally as long as within coverage of the ADSBlol initiative.
- This project uses the ADS-B data from the ADSBlol initiative. Consider supporting their great project.
- This project uses validation data from vradarserver/Andrew Whewell to check extracted routes with additional route data (based on aircraft callsign). Again, consider supporting this initiative.
Each day, ADSBlol publishes ADS-B data in two versions: prod-0 and staging-0. The largest file (by file size) is selected for each day.
After extracting the data, ADS-B transmissions are retained for all aircraft, but only about 1 in every 4 messages per aircraft is kept - more specifically the detailed ones, not the basic intermediate transmissions. This primarily affects the accuracy of the enroute phase which for extraction of arrival and departure data of a flight is of less relevance anyway. At the same time, this ensures that processing the cumulative data for each quarter of a year remains feasible:
See the Releases section of this repository for a parquet file with the flights per aircraft, per quarter of a year.
The parquet filetype has been selected to keep flights data manageable in terms of size and processing/loading times. Each quarter features approx. 10-13+ million flights and ~500,000 aircraft, which in csv format would total approx. 3 GB. Hence the selection of a parquet filetype, which stays far below 1 GB. Loading a parquet file is very straightforward with python:
df = pandas.read_parquet('2024_Q1.parquet')
Furthermore, to check the parquet dataset without python, you can use tools like ParquetViewer which feature a user interface/GUI and can be installed on Windows as exe.
The data is published per quarter of a year. The 4 quarters of each year feature some overlap to ensure flights are complete (not cut in half at the quarter boundaries). Thereby, when combining the quarterly files to a yearly schedule, there will be some overlap in flights.
Do not use 'df.drop_duplicates()', but instead identify overlap flights by checking whether the entry in the df column 'Track_Origin_DateTime_UTC' falls within the quarter months of the pertaining parquet file:
df['Month_UTC'] = pandas.to_datetime(df['Track_Origin_DateTime_UTC'], errors = 'coerce').dt.month
if str(file).endswith('Q1.parquet'):
df['Inside_Season'] = df['Month_UTC'].isin([1, 2, 3])
elif ... (for the other quarters)
Status Q2 2024
Number of receivers/antennas of ADSBlol initiative (image above)
Aircraft coverage of ADSBlol initiative. Time of day ~13:00 UTC to have reasonable ops in all continents - no midnight situation in major markets (image above)
Given potentially limited ADS-B reception coverage of the ADSBlol initiative in certain continents, some aircraft tracks start after the airport of origin or end before the airport of destination. For those cases, the flights data has been enhanced by looking up the aircraft flight callsign and matching it with the open-source aircraft callsign vs route dataset of vradarserver/Andrew Whewell.
Given ADS-B transmissions simply sending location data, wrong location data as a result of GPS spoofing can also be transmitted. Once more, the added column with callsign vs route lookup allows to filter out those flights where aircraft emitted wrong position data.
Please use in line with the license defined in this repository. No guarantee, no liability, no warranty. All open-source.
For questions, please refer to my LinkedIn profile Sebastiaan Menger - LinkedIn
Contact me for enhanced datasets featuring:
- Filtered airport-specific flight schedules
- Aircraft seats
- Probable RWY used
- Corrected RWY times in case of incomplete tracks
- RWY times in local timezone/daylight saving time
- Plausible airport in case of incomplete tracks
- Ancillary data such as airline type, alliance, etc
- Calculation of rolling hour traffic/seats data
This concerns high-level/approximated RWY times in UTC, so lift-off time for departures and touchdown time for arrivals. This is generally reference to the first 'ground' entry for arrivals, and the last 'ground' entry for departures. However, there can also be cases with more limited ADS-B coverage, where the track does not start or stop at the airport:
For those cases, the beginning/end of the track has been selected as the time of the flight. For further implications, see section below.
Similar to the section above, for those cases where the track does not start or stop at the airport, multiple airports in the vicinity of the first/last position of the ADS-B track have been listed as options. To nevertheless determine the plausible airport of origin/destination, validation data from vradarserver/Andrew Whewell has been included to match the aircraft flight callsign with external route data.
Occasionally, in case of flight tracks with large timegaps (1), potential GPS spoofing (2) or detours due to (e.g.) thunderstorms (3), the flight linking algorithm could introduce a limited amount of duplicate filghts. An example is the Frankfurt (FRA) to Dubai (DXB) route, where the ADS-B position reports feature large timegaps (hours), as well as potentially some GPS spoofing over Turkey / the Black Sea. This could sometimes make it difficult to determine what parts of the route belong together, or potentially a landing occurred in between (in an area without coverage).
In the 2024 datasets, on the mentioned route, this could worst case result in approx. 10% duplicate flights being created (concentration of track transmissions in the Dubai area and concentration of track transmissions over Turkey/Europe, occasionally being considered as separate flights). For the 2025-Q2 datasets and onwards, the algorithm has been tweaked to also check callsign matches of the transmission reports during the enroute phase, to reduce duplicates (on a route like DXB - FRA this resulted in 95%+ accurate flights):
The algorithm to extract a flight route from ADS-B positon reports is primarily focussed on commercial flights. The applied airport lookup list (to assign an airport to the start/end of a flight) is focussed on airports for commercial operations. As a result, several regional small airfields for general aviation (GA) are not included. This implies that the accuracy of the assigned airport of origin/airport of destination can be lower for GA flights. In the example below, a flight actually departing from EDKS (which is not in the airport lookup list), would then be assigned 'EDLW' as the closest airport featured in the lookup list:
This only concerns some GA flights, which is not the main aim of the dataset given its focus on commercial flights. From 2025 onwards, for each flight the origin lat/lon as well as destination lat/lon has been included, so that manual assignment of a regional GA airfield becomes possible.
In case of go-arounds/touch-and-go/balked landings, only the final touchdown is counted as touchdown time of the flight - again with commercial flights in mind.
The more ADS-B receivers are added to the adsb.lol initiative through ADSB.im software, the more accurate the derived flight schedules in this repository also become (accuracy of airport of origin/destination and pertaining RWY times).