De-dedups bookplate metadata further upstream #1424

shelleydoljack · 2024-11-09T00:09:22Z

De-dups the bookplate metadata, starting with the bookplate_funds_polines task, which will return this data structure:

{
      "5513c3d7-7c6b-45ea-a875-09798b368873": {
        "bookplate_metadata": [
          {"fund_name": "...", "druid": "...", "image_filename": "...", "title": "..."},
          {"fund_name": "...", "druid": "...", "image_filename": "...", "title": "..."},
        ]
      },
      ...
 }

By turning the return of this task into a dict, with the po-line ID as key, we can also avoid extra calls to orders-storage/po-lines when getting the po line data. And since it could be the case where different po lines reference the same instance ID, we end up de-duping the bookplate metadata again in the instances_from_po_lines task (the test_instances_from_po_lines tests this scenario).

…ines and moves tests to renamed test_bookplates.py.

jgreben

I'd like to understand the tests better and the need for the explicit sorting in the tests.

jgreben · 2024-11-11T16:10:49Z

tests/digital_bookplates/test_bookplates.py

+    )
+
+    assert len(instances_dict["e6803f0b-ed22-48d7-9895-60bea6826e93"]) == 2
+    bookplates = [


I am confused here. Should the instances_dict already be sorted and deduped because it calls the instances_from_po_lines function? If so why is it being forcibly sorted here?

You're right, we shouldn't need to sort the outcome of the function (that does call a sort to dedup). Let me take out the sorting; it seems unnecessary.

jgreben · 2024-11-11T16:15:03Z

tests/digital_bookplates/test_bookplates.py

+    sorted_mock_bookplates = [
+        sorted(bookplate.values()) for bookplate in mock_bookplate_metadata
+    ]
+    for row in sorted_bookplates:


Is this being sorted here only in order to assert the comparison for each row, or is this somehow related to the dedup function?

This sorting was unnecessary and wrong since the _dedup_bookplates function is sorting the bookplate dictionaries by key, value pairs when transforming to a tuple. Here, the mock_bookplate_metadata was sorting only the values. I removed the sorting altogether for the tests.

…plates sorts dict object by key.

…hat is list of lists.

shelleydoljack · 2024-11-11T22:42:13Z

I found a small-ish issue with the trigger_digital_bookplate_979_task as I tested running the DAG with the LINDER fund in dev. The log for the dag run shows INFO - Total incoming instances 18 but this is the number of mapped tasks from the previous task. I fixed it in this commit.

I also removed the 979s that were in the records https://folio-test.stanford.edu/inventory/view/ca2c1623-bbe5-40fb-b724-e22aacf8a007 and https://folio-test.stanford.edu/inventory/view/66d4d0a9-f4bc-58de-a4b9-02b0f439ee65 before running the digital_bookplate_instances dag with the LINDER fund config. The digital_bookplate_979 dag runs are still running, but there are currently only 1 979 in each of those records nows.

shelleydoljack added 2 commits November 8, 2024 15:04

Changes bookplate_funds_polines to return a dict grouped by po line id.

7c4c88d

Changes instances_from_po_lines for new data from bookplate_funds_pol…

618f0fc

…ines and moves tests to renamed test_bookplates.py.

shelleydoljack requested review from jgreben and jermnelson November 9, 2024 00:09

Fixed formatting.

ad01149

jermnelson approved these changes Nov 11, 2024

View reviewed changes

jgreben reviewed Nov 11, 2024

View reviewed changes

shelleydoljack added 2 commits November 11, 2024 12:10

Sorts mock_bookplate_funds_polines by bookplate key since _dedup_book…

18f85bd

…plates sorts dict object by key.

Uses set intersection to compare values in list of dictionaries.

06d2646

shelleydoljack requested a review from jgreben November 11, 2024 21:12

Updates trigger_digital_bookplate_979_task to process incoming data t…

c6742fb

…hat is list of lists.

shelleydoljack requested a review from jermnelson November 11, 2024 22:42

jgreben approved these changes Nov 12, 2024

View reviewed changes

jermnelson approved these changes Nov 12, 2024

View reviewed changes

jgreben merged commit 8e39716 into main Nov 12, 2024
4 checks passed

jgreben deleted the t1390-duplicate-979s branch November 12, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

De-dedups bookplate metadata further upstream #1424

De-dedups bookplate metadata further upstream #1424

Uh oh!

shelleydoljack commented Nov 9, 2024

Uh oh!

jgreben left a comment

Uh oh!

jgreben Nov 11, 2024

Uh oh!

shelleydoljack Nov 11, 2024

Uh oh!

jgreben Nov 11, 2024

Uh oh!

shelleydoljack Nov 11, 2024

Uh oh!

shelleydoljack commented Nov 11, 2024

Uh oh!

Uh oh!

Uh oh!

De-dedups bookplate metadata further upstream #1424

De-dedups bookplate metadata further upstream #1424

Uh oh!

Conversation

shelleydoljack commented Nov 9, 2024

Uh oh!

jgreben left a comment

Choose a reason for hiding this comment

Uh oh!

jgreben Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

shelleydoljack Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

jgreben Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

shelleydoljack Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

shelleydoljack commented Nov 11, 2024

Uh oh!

Uh oh!

Uh oh!