Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correcting the issue with miscalculation of the median_vehicle_postion_age. #3520

Merged
merged 2 commits into from
Oct 30, 2024

Conversation

fsalemi
Copy link
Contributor

@fsalemi fsalemi commented Oct 29, 2024

Description

Distincting the vehicle_message_age caused the median of the vehicle positions age per agencies to be abnormally elevated. In this PR, I corrected this issue by removing the distinct function. While this change is necessary for calculating median_vehicle_message_age, we still need to apply distinct processing to the headers when calculating median_header_message_age to deduplicate them to the overall message level, so the distinct function remains in place for calculations involving the feed header.

Resolves #3512

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

poetry run dbt run -s +fct_daily_vehicle_positions_latency_statistics

_22:27:16 Running with dbt=1.5.1
22:27:19 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:

  • models.calitp_warehouse.mart.ad_hoc
    22:27:20 Found 424 models, 950 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
    22:27:20
    22:27:25 Concurrency: 8 threads (target='dev')
    22:27:25
    22:27:25 1 of 16 START sql view model farhad_staging.stg_gtfs_rt__vehicle_positions ..... [RUN]
    22:27:25 2 of 16 START sql view model farhad_staging.stg_gtfs_schedule__agency .......... [RUN]
    22:27:25 3 of 16 START sql view model farhad_staging.stg_gtfs_schedule__download_outcomes [RUN]
    22:27:25 4 of 16 START sql view model farhad_staging.stg_gtfs_schedule__file_parse_outcomes [RUN]
    22:27:25 5 of 16 START sql view model farhad_staging.stg_gtfs_schedule__unzip_outcomes .. [RUN]
    22:27:25 6 of 16 START sql view model farhad_staging.stg_transit_database__gtfs_datasets [RUN]
    22:27:26 3 of 16 OK created sql view model farhad_staging.stg_gtfs_schedule__download_outcomes [CREATE VIEW (0 processed) in 1.13s]
    22:27:26 6 of 16 OK created sql view model farhad_staging.stg_transit_database__gtfs_datasets [CREATE VIEW (0 processed) in 1.11s]
    22:27:26 7 of 16 START sql table model farhad_staging.int_transit_database__gtfs_datasets_dim [RUN]
    22:27:26 1 of 16 OK created sql view model farhad_staging.stg_gtfs_rt__vehicle_positions [CREATE VIEW (0 processed) in 1.20s]
    22:27:26 4 of 16 OK created sql view model farhad_staging.stg_gtfs_schedule__file_parse_outcomes [CREATE VIEW (0 processed) in 1.18s]
    22:27:26 8 of 16 START sql view model farhad_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes [RUN]
    22:27:26 5 of 16 OK created sql view model farhad_staging.stg_gtfs_schedule__unzip_outcomes [CREATE VIEW (0 processed) in 1.26s]
    22:27:26 2 of 16 OK created sql view model farhad_staging.stg_gtfs_schedule__agency ..... [CREATE VIEW (0 processed) in 1.30s]
    22:27:27 8 of 16 OK created sql view model farhad_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes [CREATE VIEW (0 processed) in 1.33s]
    22:27:27 9 of 16 START sql view model farhad_staging.int_gtfs_schedule__joined_feed_outcomes [RUN]
    22:27:29 9 of 16 OK created sql view model farhad_staging.int_gtfs_schedule__joined_feed_outcomes [CREATE VIEW (0 processed) in 1.34s]
    22:27:29 10 of 16 START sql table model farhad_mart_gtfs.dim_schedule_feeds ............. [RUN]
    22:27:31 7 of 16 OK created sql table model farhad_staging.int_transit_database__gtfs_datasets_dim [CREATE TABLE (4.9k rows, 5.0 GiB processed) in 4.59s]
    22:27:31 11 of 16 START sql table model farhad_mart_transit_database.bridge_schedule_dataset_for_validation [RUN]
    22:27:31 12 of 16 START sql table model farhad_mart_transit_database.dim_gtfs_datasets .. [RUN]
    22:27:33 12 of 16 OK created sql table model farhad_mart_transit_database.dim_gtfs_datasets [CREATE TABLE (4.9k rows, 1.6 MiB processed) in 2.34s]
    22:27:33 13 of 16 START sql table model farhad_staging.int_transit_database__urls_to_gtfs_datasets [RUN]
    22:27:33 11 of 16 OK created sql table model farhad_mart_transit_database.bridge_schedule_dataset_for_validation [CREATE TABLE (2.8k rows, 511.7 KiB processed) in 2.54s]
    22:27:35 13 of 16 OK created sql table model farhad_staging.int_transit_database__urls_to_gtfs_datasets [CREATE TABLE (4.9k rows, 868.9 KiB processed) in 2.35s]
    22:28:37 10 of 16 OK created sql table model farhad_mart_gtfs.dim_schedule_feeds ........ [CREATE TABLE (14.1k rows, 11.4 GiB processed) in 68.10s]
    22:28:37 14 of 16 START sql table model farhad_mart_gtfs.fct_daily_schedule_feeds ....... [RUN]
    22:28:42 14 of 16 OK created sql table model farhad_mart_gtfs.fct_daily_schedule_feeds .. [CREATE TABLE (280.3k rows, 2.8 MiB processed) in 5.20s]
    22:28:42 15 of 16 START sql view model farhad_mart_gtfs.fct_vehicle_positions_messages .. [RUN]
    22:28:43 15 of 16 OK created sql view model farhad_mart_gtfs.fct_vehicle_positions_messages [CREATE VIEW (0 processed) in 1.25s]
    22:28:43 16 of 16 START sql incremental model farhad_mart_gtfs_quality.fct_daily_vehicle_positions_latency_statistics [RUN]
    22:32:07 16 of 16 OK created sql incremental model farhad_mart_gtfs_quality.fct_daily_vehicle_positions_latency_statistics [CREATE TABLE (989.0 rows, 391.9 GiB processed) in 203.33s]
    22:32:07
    22:32:07 Finished running 9 view models, 6 table models, 1 incremental model in 0 hours 4 minutes and 47.05 seconds (287.05s).
    22:32:07
    22:32:07 Completed successfully
    22:32:07
    22:32:07 Done. PASS=16 WARN=0 ERROR=0 SKIP=0 TOTAL=16_

Post-merge follow-ups

The next follow-up step for this PR is to replace the old table, fct_daily_vehicle_positions_message_age_summary, with this new table in Metabase dashboards. This will ensure the dashboards display accurate data for vehicle position latency..

  • [] No action required
  • Actions required (specified below)

@fsalemi
Copy link
Contributor Author

fsalemi commented Oct 29, 2024

@evansiroky , please review and approve when you get a chance.

Copy link

github-actions bot commented Oct 29, 2024

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.gtfs_quality.fct_daily_vehicle_positions_latency_statistics

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@evansiroky
Copy link
Member

Nice work!

@fsalemi
Copy link
Contributor Author

fsalemi commented Oct 30, 2024

@evansiroky The next step for this PR is to update the Metabase dashboards to use this new table, replacing the old fct_daily_vehicle_positions_message_age_summary table. This change will ensure that the dashboards represent accurate vehicle position latency data. Please let me know if you want me to complete this task.

@fsalemi fsalemi merged commit a3e62e4 into main Oct 30, 2024
4 checks passed
@fsalemi fsalemi deleted the vehicle_positions_no_distinct branch October 30, 2024 19:41
@evansiroky
Copy link
Member

@evansiroky The next step for this PR is to update the Metabase dashboards to use this new table, replacing the old fct_daily_vehicle_positions_message_age_summary table. This change will ensure that the dashboards represent accurate vehicle position latency data. Please let me know if you want me to complete this task.

Yes, please! Also cc @vevetron regarding the reports website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Issue with the calculation of Vehicle Position Message Latency statistics.
2 participants