-
Notifications
You must be signed in to change notification settings - Fork 132
tapdb: remove duplicate assets before adding unique index #980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are four tables with a foreign key reference to assets
:
asset_witnesses
asset_proofs
passive_assets
addr_events
This PR addresses asset_witnesses
but ignores the other three tables. Are we sure we can safely do so?
Yeah, good point. In the example database I was given there were no entries in the |
@gijswijs I added some more involved logic to properly account for what to do with the That leaves the |
LEFT JOIN asset_transfer_inputs ati | ||
ON ati.anchor_point = mu.outpoint | ||
WHERE a.spent = false | ||
AND ati.input_id IS NOT NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if my understanding is wrong: if ati.input_id IS NOT NULL
then the asset is spent? And those are the rows that we ensure are set to spent, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. We LEFT JOIN
here, meaning that any field on ati
is NULL
if there is no matching spending transaction referencing the asset's on-chain outpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!! This should fix a lot of the cases of duplication. I think we might run into one or two edge cases where manual intervention is still needed, but the majority (if not all) of it is now covered. I have some questions, a nit and one suggestion that I think will improve the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some lingering Qs on query specifics and explanatory comments.
UPDATE asset_proofs | ||
SET asset_id = filtered_mapping.new_asset_id | ||
FROM ( | ||
SELECT MIN(old_asset_id) AS old_asset_id, new_asset_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is MIN()
doing here? Was a bit confused by the reuse of the names old_asset_id
and new_asset_id
between the tmp_asset_id_mapping
table and this filtered_mapping
table.
IIUC this subquery is selecting the older asset ID for asset proof entries that may have the same new_asset_id
in the tmp_asset_id_mapping
table?
So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id
(which is the smaller asset ID), asset_proofs.asset_id
would be updated to use that new_asset_id
.
And any proof already referencing that new_asset_id
should be unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asset_id
in table asset_proofs
has a UNIQUE
constraint. So if we map old_asset_id
a and b both to new_asset_id
x we violate that constraint. So by grouping on new_asset_id
and taking MIN(old_asset_id)
we only map a single old_asset_id
to the new_asset_id
. (a to x in this case) The remaining asset_proof with foreign key b will be deleted in step 8.
The filtered_mapping
table does the exact same thing as tmp_asset_id_mapping
but taking into account the UNIQUE
constraint on asset_id
in table asset_proofs
. Hence I reused the column names.
So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id (which is the smaller asset ID), asset_proofs.asset_id would be updated to use that new_asset_id.
Correct.
And any proof already referencing that new_asset_id should be unchanged.
It will get updated, but with the same values, so the result is unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great fix and should be safe 👍🏽
This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions. After removing the duplicates, the new unique index should be applied without issues.
This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions.
After removing the duplicates, the new unique index should be applied without issues.