Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors on GTFS RT validation #3513

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

erikamov
Copy link
Contributor

@erikamov erikamov commented Oct 25, 2024

Description

This PR fixes some errors on GTFS RT validation that were found investigating the issue #2780:

  • It resolves a race condition where all parse_and_validate calls shared the same temporary directory
  • That contention meant that processes would overwrite the existing GTFS schedule with the same name
  • This also resulted in an elevated number of skipped protobuf validations
  • The gtfs-realtime-validator skips protobufs with the same MD5
  • The race condition caused elevated MD5 collisions for protobufs

The error message described in the issue [Errno 2] No such file or directory is caused by the file skipped (mentioned above). It could be treated as a message instead of an error.
Screenshot 2024-10-25 at 10 26 29 AM

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Tested locally using poetry run pytest which replicates running the validator tool on the command line, e.g.:

GTFS_RT_VALIDATOR_JAR="gtfs-realtime-validator-lib-1.0.0-20220223.003525-2.jar" CALITP_BUCKET__GTFS_SCHEDULE_RAW="gs://test-calitp-gtfs-schedule-raw-v2" CALITP_BUCKET__GTFS_RT_RAW="gs://test-calitp-gtfs-rt-raw-v2" CALITP_BUCKET__GTFS_RT_PARSED="gs://test-calitp-gtfs-rt-parsed" CALITP_BUCKET__GTFS_RT_VALIDATION="gs://test-calitp-gtfs-rt-validation" GTFS_RT_VALIDATOR_VERSION="v1.0.0" poetry run python3 gtfs_rt_parser.py validate vehicle_positions 2024-10-17T00:00:00  --verbose

Post-merge follow-ups

  • No action required
  • Actions required (specified below)

Monitoring next DAG runs and query cal-itp-data-infra.staging.int_gtfs_quality__rt_validation_outcomes to see less errors happening.

@vevetron
Copy link
Contributor

Should the test be failing for this?

ohrite and others added 6 commits November 15, 2024 11:15
* This simplifies the flow of control so that every command runs the same code

Signed-off-by: Doc Ritezel <[email protected]>
* This commit resolves a race condition where all parse_and_validate calls shared the same temporary directory
* That contention meant that processes would overwrite the existing GTFS schedule with the same name
* This also resulted in an elevated number of skipped protobuf validations
* The gtfs-realtime-validator skips protobufs with the same MD5
* The race condition caused elevated MD5 collisions for protobufs

Signed-off-by: Doc Ritezel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants