-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-dedups bookplate metadata further upstream #1424
Conversation
…ines and moves tests to renamed test_bookplates.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand the tests better and the need for the explicit sorting in the tests.
) | ||
|
||
assert len(instances_dict["e6803f0b-ed22-48d7-9895-60bea6826e93"]) == 2 | ||
bookplates = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here. Should the instances_dict
already be sorted and deduped because it calls the instances_from_po_lines
function? If so why is it being forcibly sorted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we shouldn't need to sort the outcome of the function (that does call a sort to dedup). Let me take out the sorting; it seems unnecessary.
sorted_mock_bookplates = [ | ||
sorted(bookplate.values()) for bookplate in mock_bookplate_metadata | ||
] | ||
for row in sorted_bookplates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being sorted here only in order to assert the comparison for each row, or is this somehow related to the dedup function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sorting was unnecessary and wrong since the _dedup_bookplates function is sorting the bookplate dictionaries by key, value pairs when transforming to a tuple. Here, the mock_bookplate_metadata was sorting only the values. I removed the sorting altogether for the tests.
…plates sorts dict object by key.
…hat is list of lists.
I found a small-ish issue with the trigger_digital_bookplate_979_task as I tested running the DAG with the LINDER fund in dev. The log for the dag run shows I also removed the 979s that were in the records https://folio-test.stanford.edu/inventory/view/ca2c1623-bbe5-40fb-b724-e22aacf8a007 and https://folio-test.stanford.edu/inventory/view/66d4d0a9-f4bc-58de-a4b9-02b0f439ee65 before running the digital_bookplate_instances dag with the LINDER fund config. The digital_bookplate_979 dag runs are still running, but there are currently only 1 979 in each of those records nows. |
Fixes #1390
De-dups the bookplate metadata, starting with the bookplate_funds_polines task, which will return this data structure:
By turning the return of this task into a dict, with the po-line ID as key, we can also avoid extra calls to orders-storage/po-lines when getting the po line data. And since it could be the case where different po lines reference the same instance ID, we end up de-duping the bookplate metadata again in the instances_from_po_lines task (the test_instances_from_po_lines tests this scenario).