Skip to content

Conversation

erikamov
Copy link
Contributor

@erikamov erikamov commented Oct 25, 2024

Description

This PR fixes some errors on GTFS RT validation that were found investigating the issue #2780:

  • It resolves a race condition where all parse_and_validate calls shared the same temporary directory
  • That contention meant that processes would overwrite the existing GTFS schedule with the same name
  • This also resulted in an elevated number of skipped protobuf validations
  • The gtfs-realtime-validator skips protobufs with the same MD5
  • The race condition caused elevated MD5 collisions for protobufs

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Tested locally using poetry run pytest which replicates running the validator tool on the command line, e.g.:

GTFS_RT_VALIDATOR_JAR="gtfs-realtime-validator-lib-1.0.0-20220223.003525-2.jar" CALITP_BUCKET__GTFS_SCHEDULE_RAW="gs://test-calitp-gtfs-schedule-raw-v2" CALITP_BUCKET__GTFS_RT_RAW="gs://test-calitp-gtfs-rt-raw-v2" CALITP_BUCKET__GTFS_RT_PARSED="gs://test-calitp-gtfs-rt-parsed" CALITP_BUCKET__GTFS_RT_VALIDATION="gs://test-calitp-gtfs-rt-validation" GTFS_RT_VALIDATOR_VERSION="v1.0.0" poetry run python3 gtfs_rt_parser.py validate vehicle_positions 2024-10-17T00:00:00  --verbose

Post-merge follow-ups

  • No action required
  • Actions required (specified below)

Monitoring next DAG runs and query cal-itp-data-infra.staging.int_gtfs_quality__rt_validation_outcomes to see less errors happening.

@erikamov erikamov force-pushed the mov/2780-gtfs-rt-validation branch from abf1c4c to a2bf5eb Compare October 25, 2024 00:32
@ohrite ohrite force-pushed the mov/2780-gtfs-rt-validation branch from b8b06c4 to 21263b2 Compare October 26, 2024 01:41
@vevetron
Copy link
Contributor

Should the test be failing for this?

@erikamov erikamov force-pushed the mov/2780-gtfs-rt-validation branch 2 times, most recently from 0d2a924 to 6edb9a0 Compare November 15, 2024 20:36
ohrite and others added 7 commits November 18, 2024 11:10
* This simplifies the flow of control so that every command runs the same code

Signed-off-by: Doc Ritezel <[email protected]>
* This commit resolves a race condition where all parse_and_validate calls shared the same temporary directory
* That contention meant that processes would overwrite the existing GTFS schedule with the same name
* This also resulted in an elevated number of skipped protobuf validations
* The gtfs-realtime-validator skips protobufs with the same MD5
* The race condition caused elevated MD5 collisions for protobufs

Signed-off-by: Doc Ritezel <[email protected]>
@erikamov erikamov force-pushed the mov/2780-gtfs-rt-validation branch from 9be1c30 to 88ed9ad Compare November 18, 2024 19:13
@erikamov
Copy link
Contributor Author

Should the test be failing for this?

We added a tag to skip the tests until we can use GCS on Github Actions.

@erikamov erikamov requested a review from ohrite November 18, 2024 19:19
Copy link
Contributor

@ohrite ohrite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍐 Pairing w/ Erika

@erikamov erikamov merged commit d205f58 into main Nov 18, 2024
4 checks passed
@erikamov erikamov deleted the mov/2780-gtfs-rt-validation branch November 18, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants