Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way to suppress draft/non-public GTFS data from being published to public-facing websites #2834

Closed
evansiroky opened this issue Jul 21, 2023 · 4 comments

Comments

@evansiroky
Copy link
Member

User story / feature request

As a Transit Data Quality Analyst, I would like to be able to use all of the analysis tools available within our data warehouse to analyze GTFS data that is not authorized for public consumption by a transit agency, so that I can work with transit agencies to help them improve data quality on non-public feeds.

As a Transit Database Maintainer, I would like a way to enter data into Airtable about non-public GTFS feeds, so that I know that the contents of the non-public GTFS feeds will not be made public.

Acceptance Criteria

Each product that publicly publishes data utilizes data about whether certain feeds are meant to be public and accordingly will not publish data from non-public feeds.

Notes

I am currently trying to help a transit agency with some data quality issues in a feed that they do not want to note as being public until they are confident in the quality of this data. There might be other use cases in the future that would benefit from having the ability to mark a GTFS dataset as being private.

@lauriemerrell
Copy link
Contributor

lauriemerrell commented Jul 21, 2023

In Slack there were some questions about what this would involve -- answering here for future reference.

  • We'd need to ingest a new configuration field from Airtable. Here's an example PR that does that and brings the new field into the warehouse / Metabase.
  • We'd need to incorporate that flag in all relevant places. Here's a PR that adds new fields from Airtable (except the external table step, which was done separately) and integrates the new fields with existing logic (not publishing related but just gives some idea of similar work).
  • For publishing specifically:
    • The relevant script that actually does the publishing is warehouse/publish.py.
    • That script runs once a week.
    • The data to be published is determined by the exposure here
    • To suppress data, adjustments could be made either by just changing the rows present in the table that is published (for example dim_gtfs_datasets_latest.sql) and/or by changing the publish script/workflow to look for rows to drop. The first is easier technically but we may want to rename the tables if we do that because then they'd no longer contain all latest data, and if the data needs to be suppressed from GTFS schedule data itself (rather than just the list of datasets), we may need to do additional lookups to get that setting onto the tables in question.

@evansiroky
Copy link
Member Author

I think this may go beyond just the open data portal. I think it would also need to apply to other products such as the speed maps and high quality transit corridors.

@lauriemerrell
Copy link
Contributor

Per guidance from contract managers, this ticket is not in scope for Jarvus data services under our current contract; removing from our issue pipeline.

@evansiroky
Copy link
Member Author

Fixed in #3430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants