Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent URI strategy is broken #1

Open
pietercolpaert opened this issue Jul 24, 2016 · 0 comments
Open

Persistent URI strategy is broken #1

pietercolpaert opened this issue Jul 24, 2016 · 0 comments

Comments

@pietercolpaert
Copy link
Member

pietercolpaert commented Jul 24, 2016

Right now, the persistent identifier uses UTC in the identifier. This should be "brussels/europe" time. Using moment.js, we should have the right behavior though. If we decide to go on with this, we should also change this in the repo https://github.com/iRail/gtfsrt2lc.

Perhaps we should however get rid of the time all together and only use a date after all. However, the problem is bigger than this:

Problem

The GTFS ZIP files update every 3 months and have a time span of 4 months. This means that each 3 months, there is a 1 month overlap of time schedules that may have changed slightly. E.g., the planned departure time of a train may have changed with 1 or more minutes (this actually happens from time to time). With out current identifier strategy, we will get a lot of duplicate connections.

Possible solutions

1. We use the departure date (year, month, day) of the trip's departure for all connections

Then it becomes hard to generate a URI automatically for a connections from which the trip started on a previous day in connections. Could we however find a solution for this somehow?

Mind that in this case we will still have a duplicate entry problem when a trip start date would change to the next day. However, the probability this happens is pretty low

2. We use the departure date (year, month, day) of the connection itself

Then we still have the problem that a connection can change from one day to another. The chance this happens is small though and the problems for our archive are small?

Earlier tests showed that this resulted in a lot of duplicate URIs. I should investigate why this happens as a route id should only happen once a day...

@pietercolpaert pietercolpaert changed the title Uses UTC time in URI: should be local time Persistent URI strategy is broken Jul 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant