Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes a datatype to accommodate bad service alert data #3503

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

vevetron
Copy link
Contributor

Description

Bigquery can't unnest the activePeriod to int since 18446744011787391616 is too big. Thus change the datatype to String and then safe_cast to INT. This seems to parse correctly.

Resolves #3498

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this been tested?

  1. Inserted the bad data in testing at the testing bucket
  2. Tested SQL to recreate bug.
  3. Updated SQL/Yamls
  4. poetry run dbt run -s +fct_service_alerts_messages_unnested+ (which didn't work before with the bad data)
(poetry_env) VevePro:warehouse vivek$ poetry run dbt run -s +fct_service_alerts_messages_unnested+
22:57:13  Running with dbt=1.5.1
22:57:17  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
22:57:18  Found 420 models, 950 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
22:57:18
22:57:24  Concurrency: 8 threads (target='dev')
22:57:24
22:57:24  1 of 21 START sql view model vb_staging.stg_gtfs_rt__service_alerts ............ [RUN]
22:57:24  2 of 21 START sql view model vb_staging.stg_gtfs_schedule__agency .............. [RUN]
22:57:24  3 of 21 START sql view model vb_staging.stg_gtfs_schedule__download_outcomes ... [RUN]
22:57:24  4 of 21 START sql view model vb_staging.stg_gtfs_schedule__file_parse_outcomes . [RUN]
22:57:24  5 of 21 START sql view model vb_staging.stg_gtfs_schedule__unzip_outcomes ...... [RUN]
22:57:24  6 of 21 START sql view model vb_staging.stg_transit_database__gtfs_datasets .... [RUN]
22:57:26  6 of 21 OK created sql view model vb_staging.stg_transit_database__gtfs_datasets  [CREATE VIEW (0 processed) in 1.57s]
22:57:26  2 of 21 OK created sql view model vb_staging.stg_gtfs_schedule__agency ......... [CREATE VIEW (0 processed) in 1.59s]
22:57:26  7 of 21 START sql table model vb_staging.int_transit_database__gtfs_datasets_dim  [RUN]
22:57:26  1 of 21 OK created sql view model vb_staging.stg_gtfs_rt__service_alerts ....... [CREATE VIEW (0 processed) in 1.60s]
22:57:26  5 of 21 OK created sql view model vb_staging.stg_gtfs_schedule__unzip_outcomes . [CREATE VIEW (0 processed) in 1.64s]
22:57:26  4 of 21 OK created sql view model vb_staging.stg_gtfs_schedule__file_parse_outcomes  [CREATE VIEW (0 processed) in 1.66s]
22:57:26  3 of 21 OK created sql view model vb_staging.stg_gtfs_schedule__download_outcomes  [CREATE VIEW (0 processed) in 1.67s]
22:57:26  8 of 21 START sql view model vb_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [RUN]
22:57:28  8 of 21 OK created sql view model vb_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [CREATE VIEW (0 processed) in 1.83s]
22:57:28  9 of 21 START sql view model vb_staging.int_gtfs_schedule__joined_feed_outcomes  [RUN]
22:57:30  9 of 21 OK created sql view model vb_staging.int_gtfs_schedule__joined_feed_outcomes  [CREATE VIEW (0 processed) in 1.99s]
22:57:30  10 of 21 START sql table model vb_mart_gtfs.dim_schedule_feeds ................. [RUN]
22:57:31  7 of 21 OK created sql table model vb_staging.int_transit_database__gtfs_datasets_dim  [CREATE TABLE (1.6k rows, 426.7 MiB processed) in 5.34s]
22:57:31  11 of 21 START sql table model vb_mart_transit_database.bridge_schedule_dataset_for_validation  [RUN]
22:57:31  12 of 21 START sql table model vb_mart_transit_database.dim_gtfs_datasets ...... [RUN]
22:57:34  11 of 21 OK created sql table model vb_mart_transit_database.bridge_schedule_dataset_for_validation  [CREATE TABLE (962.0 rows, 166.7 KiB processed) in 2.63s]
22:57:34  12 of 21 OK created sql table model vb_mart_transit_database.dim_gtfs_datasets . [CREATE TABLE (1.6k rows, 624.6 KiB processed) in 2.63s]
22:57:34  13 of 21 START sql table model vb_staging.int_transit_database__urls_to_gtfs_datasets  [RUN]
22:57:37  13 of 21 OK created sql table model vb_staging.int_transit_database__urls_to_gtfs_datasets  [CREATE TABLE (1.6k rows, 287.7 KiB processed) in 2.89s]
22:57:38  10 of 21 OK created sql table model vb_mart_gtfs.dim_schedule_feeds ............ [CREATE TABLE (1.1k rows, 388.2 MiB processed) in 8.12s]
22:57:38  14 of 21 START sql table model vb_mart_gtfs.fct_daily_schedule_feeds ........... [RUN]
22:57:42  14 of 21 OK created sql table model vb_mart_gtfs.fct_daily_schedule_feeds ...... [CREATE TABLE (130.3k rows, 416.3 KiB processed) in 4.26s]
22:57:42  15 of 21 START sql view model vb_mart_gtfs.fct_service_alerts_messages ......... [RUN]
22:57:43  15 of 21 OK created sql view model vb_mart_gtfs.fct_service_alerts_messages .... [CREATE VIEW (0 processed) in 1.32s]
22:57:43  16 of 21 START sql view model vb_staging.int_gtfs_rt__service_alerts_fully_unnested  [RUN]
22:57:45  16 of 21 OK created sql view model vb_staging.int_gtfs_rt__service_alerts_fully_unnested  [CREATE VIEW (0 processed) in 1.71s]
22:57:45  17 of 21 START sql incremental model vb_mart_gtfs.fct_service_alerts_messages_unnested  [RUN]
22:57:57  17 of 21 OK created sql incremental model vb_mart_gtfs.fct_service_alerts_messages_unnested  [SCRIPT (199.7 MiB processed) in 11.98s]
22:57:57  18 of 21 START sql incremental model vb_staging.int_gtfs_rt__service_alerts_day_map_grouping  [RUN]
22:57:57  19 of 21 START sql incremental model vb_staging.int_gtfs_rt__service_alerts_trip_day_map_grouping  [RUN]
22:58:04  19 of 21 OK created sql incremental model vb_staging.int_gtfs_rt__service_alerts_trip_day_map_grouping  [SCRIPT (48.4 MiB processed) in 6.44s]
22:58:04  20 of 21 START sql table model vb_mart_gtfs.fct_service_alerts_trip_summaries .. [RUN]
22:58:05  18 of 21 OK created sql incremental model vb_staging.int_gtfs_rt__service_alerts_day_map_grouping  [SCRIPT (47.4 MiB processed) in 7.85s]
22:58:05  21 of 21 START sql table model vb_mart_gtfs.fct_daily_service_alerts ........... [RUN]
22:58:08  20 of 21 OK created sql table model vb_mart_gtfs.fct_service_alerts_trip_summaries  [CREATE TABLE (241.0 rows, 1.2 MiB processed) in 4.11s]
22:58:21  21 of 21 OK created sql table model vb_mart_gtfs.fct_daily_service_alerts ...... [CREATE TABLE (40.6k rows, 2.6 GiB processed) in 16.46s]
22:58:21
22:58:21  Finished running 10 view models, 8 table models, 3 incremental models in 0 hours 1 minutes and 3.32 seconds (63.32s).
22:58:22
22:58:22  Completed successfully
22:58:22
22:58:22  Done. PASS=21 WARN=0 ERROR=0 SKIP=0 TOTAL=21

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)
    Make sure the dbt tests are okay!

Copy link

github-actions bot commented Oct 15, 2024

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For modified incremental models (or incremental models whose parents are modified), does the PR description identify whether a full refresh is needed for these tables?

Changed incremental models 🔀

calitp_warehouse.mart.gtfs.fct_service_alerts_messages_unnested

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_trip_day_map_grouping

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@vevetron vevetron force-pushed the gtfs_rt_services_bad_data_fix branch from 96421ae to cecafc4 Compare October 16, 2024 01:45
@vevetron vevetron merged commit 79a41be into main Oct 16, 2024
4 checks passed
@vevetron vevetron deleted the gtfs_rt_services_bad_data_fix branch October 16, 2024 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Data Pipeline failing on invalid service alerts value
2 participants