Skip to content

Clean up timezone/milliseconds mess for harvest API #459

Open
@LukasKalbertodt

Description

@LukasKalbertodt

Currently, there are a few subtle bugs in the code that gets executed by the harvest API. It's all very terrible and mostly due to badly designed libs IMO. I could identify these four problems:

  • By default, GSON uses the local time zone to format Date values. If sender and receiver are in different time zones, stuff breaks.
  • By default, GSON serializes Date values without milliseconds. So we lose milliseconds in a GSON roundtrip.
  • Date with JPAs @Temporal(TemporalType.TIMESTAMP) is turned into a DATETIME in mysql which by default does not store milliseconds. This is different from PostgreSQL where milliseconds are stored (required by the SQL standard for this type, but who cares about that...)
  • Something weird is going on with Date and JPA regarding time zones. Where I'm testing right now, the mysql server runs in UTC, while the Java application runs in UTC+2. This leads to wrong (2 hours too early) times received from the database.

While GSON and JPA is certainly also to blame, the common denominator seems to be Date. And in fact, apparently it's a terrible legacy class that one should not use anymore at all. Apparently with Java 8 a bunch of not-terribly-broken classes were added with java.time. It seems like we should just use those where ever possible.


I stopped working on this now as it was quite a time sink (plus very annoying) and the concrete problem with the Bern deployment was fixed. Somehow. I don't know what changed but it's not a problem anymore.

Some notes from my research into this:

  • JPA 2.2 adds support for java.time classes.
  • JPA is just a specification. Common implementations are hibernate and eclipselink. Opencast uses eclipselink.
  • We likely want to use java.time.Instant or java.time.OffsetDateTime for modified_date and deletion_date.
    • To avoid time zone confusions in the future, it's probably useful to store the time zone in the DB explicitly (always UTC!). This is technically wasted memory, but probably worth it to save dev time and headaches. OffsetDateTime would fit that description more closely.
      • mariadb/mysql don't seem to support timestamp with time zone?!
    • But of course, storing the explicit offset in every database row is not necessary and one can just declare the timestamp to be in UTC. And in Java, using Instant is more appropriate here.
    • Instant should be mapped to TIMESTAMP_WITH_TIMEZONE. For mysql, it results in the type datetime(3).
    • Instant is not mentioned in JPA 2.2 directly and not supported by eclipselink. It is supported by hibernate.
    • OffsetDateTime is supported by both.
    • This answer says Instant is better: https://stackoverflow.com/a/50039491/2408867
  • One can use javax.persistence.AttributeConverter to map the type of your entity during (de)serialization.
  • The standard serialization of GSON for Instant is: { "seconds": 1656425299, "nanos": 763000000 }
  • WIP patch can be found here
    • The GSON serialization is not nice to read, but it's probably fine because the relevant API is ONLY for internal remote-impl use anyway. Or does this serialization appear in any other API?
    • The database stuff also seems to work alright. OffsetDateTime is stored to the DB (which, DB systems that support it, should result in timestamp with time zone), but Instant is used in the Java code. Whether there can be weird effects when talking to DB, I don't know.

Some other links:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions