Skip to content

tapdb: remove duplicate assets before adding unique index #980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

guggero
Copy link
Member

@guggero guggero commented Jun 27, 2024

This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions.
After removing the duplicates, the new unique index should be applied without issues.

@guggero guggero requested a review from gijswijs June 27, 2024 09:16
Copy link
Contributor

@gijswijs gijswijs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four tables with a foreign key reference to assets:

  • asset_witnesses
  • asset_proofs
  • passive_assets
  • addr_events

This PR addresses asset_witnesses but ignores the other three tables. Are we sure we can safely do so?

@guggero
Copy link
Member Author

guggero commented Jun 27, 2024

There are four tables with a foreign key reference to assets:

* `asset_witnesses`

* `asset_proofs`

* `passive_assets`

* `addr_events`

This PR addresses asset_witnesses but ignores the other three tables. Are we sure we can safely do so?

Yeah, good point. In the example database I was given there were no entries in the asset_proofs, passive_assets and addr_events table. So I think due to how the other insert/upsert queries are structured, only the asset with the lowest ID would get an entry in those tables. So by keeping the asset with the lowest ID when removing duplicates should work.
But I'll look into some more safe queries with the help of LLMs.

@guggero
Copy link
Member Author

guggero commented Jun 27, 2024

@gijswijs I added some more involved logic to properly account for what to do with the asset_witnesses and asset_proofs table. Those should now be dealt with correctly, I also added test code for that.

That leaves the passive_assets and addr_events tables. I thought hard about the code that produced the duplicates and am 95% sure that we won't ever have entries in those two tables for the duplicate entries. And if for some reason there would be, then for those users the migration would fail and we could manually intervene.

@guggero guggero requested a review from ffranr June 27, 2024 15:58
Comment on lines +12 to +15
LEFT JOIN asset_transfer_inputs ati
ON ati.anchor_point = mu.outpoint
WHERE a.spent = false
AND ati.input_id IS NOT NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if my understanding is wrong: if ati.input_id IS NOT NULL then the asset is spent? And those are the rows that we ensure are set to spent, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We LEFT JOIN here, meaning that any field on ati is NULL if there is no matching spending transaction referencing the asset's on-chain outpoint.

Copy link
Contributor

@gijswijs gijswijs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!! This should fix a lot of the cases of duplication. I think we might run into one or two edge cases where manual intervention is still needed, but the majority (if not all) of it is now covered. I have some questions, a nit and one suggestion that I think will improve the test.

@gijswijs gijswijs requested a review from jharveyb July 1, 2024 13:07
@gijswijs gijswijs self-assigned this Jul 1, 2024
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some lingering Qs on query specifics and explanatory comments.

UPDATE asset_proofs
SET asset_id = filtered_mapping.new_asset_id
FROM (
SELECT MIN(old_asset_id) AS old_asset_id, new_asset_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is MIN() doing here? Was a bit confused by the reuse of the names old_asset_id and new_asset_id between the tmp_asset_id_mapping table and this filtered_mapping table.

IIUC this subquery is selecting the older asset ID for asset proof entries that may have the same new_asset_id in the tmp_asset_id_mapping table?

So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id (which is the smaller asset ID), asset_proofs.asset_id would be updated to use that new_asset_id.

And any proof already referencing that new_asset_id should be unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asset_id in table asset_proofs has a UNIQUE constraint. So if we map old_asset_id a and b both to new_asset_id x we violate that constraint. So by grouping on new_asset_id and taking MIN(old_asset_id) we only map a single old_asset_id to the new_asset_id. (a to x in this case) The remaining asset_proof with foreign key b will be deleted in step 8.

The filtered_mapping table does the exact same thing as tmp_asset_id_mapping but taking into account the UNIQUE constraint on asset_id in table asset_proofs. Hence I reused the column names.

So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id (which is the smaller asset ID), asset_proofs.asset_id would be updated to use that new_asset_id.

Correct.

And any proof already referencing that new_asset_id should be unchanged.

It will get updated, but with the same values, so the result is unchanged.

@jharveyb jharveyb self-requested a review July 2, 2024 15:45
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great fix and should be safe 👍🏽

This commit first removes duplicate assets that were mistakenly created
due to self transfers in previous versions.
After removing the duplicates, the new unique index should be applied
without issues.
@Roasbeef Roasbeef enabled auto-merge July 2, 2024 18:43
@Roasbeef Roasbeef added this pull request to the merge queue Jul 2, 2024
Merged via the queue into main with commit 2d8d484 Jul 2, 2024
16 checks passed
@guggero guggero deleted the unique-key-fix branch July 8, 2024 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

6 participants