-
Notifications
You must be signed in to change notification settings - Fork 18
fix: serialisation of since
datetime
#88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: serialisation of since
datetime
#88
Conversation
* The `since` parameter of many jobs is represented in Python with a datetime.datetime object. This cannot be directly serialised into a str, so a `default=str` was added to `json.dumps`. However, this serialises the date into a non-standard format instead of ISO 8601 which is what job handlers expect to receive. * Added a custom default function to turn any datetime into an ISO 8601 formatted string, without affecting any other job args.
* To ensure the UI appears consistent with the actual job arguments, I generalised the `json.dumps` custom `default` function and used it everywhere a job is serialised. * The uses in `services/results.py` are purely UI-related but it's still important to ensure the timestamps are displayed consistently (as well as any future changes to the custom serialiser)
* Some job implementations downstream assume the `since` argument in jobs is an ISO timestamp *including timezone info*. We store all dates in the DB in UTC but without associated TZ info. * It is safe to add in the TZ info (UTC) when setting the `since` arg as we know it is always in the UTC zone. Job implementors can then more easily (and without ambiguity about the TZ) use the `since` arg.
@@ -50,7 +51,7 @@ def data(self): | |||
job_dict["last_run"] = self._obj.last_run.dump() | |||
job_dict["last_runs"] = self._obj.last_runs | |||
job_dict["default_args"] = json.dumps( | |||
self._obj.default_args, indent=4, sort_keys=True, default=str | |||
self._obj.default_args, default=job_arg_json_dumper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palkerecsenyi indent and sort_keys removed because are the default values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I could tell, we aren't using the dumped JSON for anything user-facing, it's only stored in the database. As such we don't need the indent
/sort_keys
which (I think) serve purely aesthetic purposes. With the default values instead it returns a more minified JSON string that also saves some space in the DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm we should be using it when you serialize the job args in a configured job in the administration panel, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def job_arg_json_dumper(obj): | ||
"""Handle non-serializable values such as datetimes when dumping the arguments of a job run.""" | ||
if isinstance(obj, datetime): | ||
return obj.isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's ensure here that there is no timezone or Z
char added to the string, so that we avoid manipulating the string in any celery task or UI. Can you please add a test?
[like] Zacharias Zacharodimos reacted to your message:
________________________________
From: Pal Kerecsenyi ***@***.***>
Sent: Thursday, July 10, 2025 12:24:13 PM
To: inveniosoftware/invenio-jobs ***@***.***>
Cc: Zacharias Zacharodimos ***@***.***>; Comment ***@***.***>
Subject: Re: [inveniosoftware/invenio-jobs] fix: serialisation of `since` datetime (PR #88)
@palkerecsenyi commented on this pull request.
________________________________
In invenio_jobs/services/results.py<#88 (comment)>:
@@ -50,7 +51,7 @@ def data(self):
job_dict["last_run"] = self._obj.last_run.dump()
job_dict["last_runs"] = self._obj.last_runs
job_dict["default_args"] = json.dumps(
- self._obj.default_args, indent=4, sort_keys=True, default=str
+ self._obj.default_args, default=job_arg_json_dumper
In the admin panel this is passed into a JS-based UI which prettifies the JSON anyway, so the format it receives the object in doesn't matter in theory
image.png (view on web)<https://github.com/user-attachments/assets/ad0f5c85-91a9-425e-a8b5-e2795a8c09b6>
—
Reply to this email directly, view it on GitHub<#88 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFMMJX52XWRU3UWQPQNLOXT3HZLO3AVCNFSM6AAAAACA6HKCGSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTAMBVGQ2TCMBVGM>.
You are receiving this because you commented.Message ID: ***@***.***>
|
* Changed the `since` argument so task implementors always receive a timestamp with the `+00:00` syntax instead of `Z` which is unsupported by Python <3.10 * Added an error message shown on the admin dashboard when running a job with an invalid `since` timestamp (i.e. not ISO compliant) * Added unit test for the custom `json.dumps` serialiser that stringifies dates to ISO timestamps * Modified the job unit test to use the `since` argument, and added a test case for an invalid timestamp to ensure the correct error message is returned to the client
Tests are currently failing due to inveniosoftware/pytest-invenio#128 |
i answerd on the pytest-invenio issue |
For reference: the tests are passing locally with the changes in #94, but we won't cherry-pick the commit into this branch to avoid duplication. We can merge this PR, and the tests will pass once #94 is also merged. Local test output
|
Please release this PR at the same time as the changes in #87 (comment).
Closes #87.
The
since
parameter of many jobs is represented in Python with a datetime.datetime object. This cannot be directly serialised into a str, so adefault=str
was added tojson.dumps
. However, this serialises the date into a non-standard format instead of ISO 8601 which is what job handlers expect to receive.The marshmallow schema
PredefinedArgsSchema
indeed assigns theiso
format to thesince
argument but then contains a non-ISO format in the description. This has been changed to be more clear.Added a custom default function to turn any datetime into an ISO 8601 formatted string, without affecting any other job args..