Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-dedups bookplate metadata further upstream #1424

Merged
merged 6 commits into from
Nov 12, 2024
Merged

Conversation

shelleydoljack
Copy link
Contributor

Fixes #1390

De-dups the bookplate metadata, starting with the bookplate_funds_polines task, which will return this data structure:

{
      "5513c3d7-7c6b-45ea-a875-09798b368873": {
        "bookplate_metadata": [
          {"fund_name": "...", "druid": "...", "image_filename": "...", "title": "..."},
          {"fund_name": "...", "druid": "...", "image_filename": "...", "title": "..."},
        ]
      },
      ...
 }

By turning the return of this task into a dict, with the po-line ID as key, we can also avoid extra calls to orders-storage/po-lines when getting the po line data. And since it could be the case where different po lines reference the same instance ID, we end up de-duping the bookplate metadata again in the instances_from_po_lines task (the test_instances_from_po_lines tests this scenario).

Copy link
Contributor

@jgreben jgreben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand the tests better and the need for the explicit sorting in the tests.

)

assert len(instances_dict["e6803f0b-ed22-48d7-9895-60bea6826e93"]) == 2
bookplates = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. Should the instances_dict already be sorted and deduped because it calls the instances_from_po_lines function? If so why is it being forcibly sorted here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we shouldn't need to sort the outcome of the function (that does call a sort to dedup). Let me take out the sorting; it seems unnecessary.

sorted_mock_bookplates = [
sorted(bookplate.values()) for bookplate in mock_bookplate_metadata
]
for row in sorted_bookplates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being sorted here only in order to assert the comparison for each row, or is this somehow related to the dedup function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sorting was unnecessary and wrong since the _dedup_bookplates function is sorting the bookplate dictionaries by key, value pairs when transforming to a tuple. Here, the mock_bookplate_metadata was sorting only the values. I removed the sorting altogether for the tests.

@shelleydoljack
Copy link
Contributor Author

I found a small-ish issue with the trigger_digital_bookplate_979_task as I tested running the DAG with the LINDER fund in dev. The log for the dag run shows INFO - Total incoming instances 18 but this is the number of mapped tasks from the previous task. I fixed it in this commit.

I also removed the 979s that were in the records https://folio-test.stanford.edu/inventory/view/ca2c1623-bbe5-40fb-b724-e22aacf8a007 and https://folio-test.stanford.edu/inventory/view/66d4d0a9-f4bc-58de-a4b9-02b0f439ee65 before running the digital_bookplate_instances dag with the LINDER fund config. The digital_bookplate_979 dag runs are still running, but there are currently only 1 979 in each of those records nows.

@jgreben jgreben merged commit 8e39716 into main Nov 12, 2024
4 checks passed
@jgreben jgreben deleted the t1390-duplicate-979s branch November 12, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instance with duplicated 979s sent to digital_bookplate_979 DAG
3 participants