tapdb: remove duplicate assets before adding unique index #980

guggero · 2024-06-27T09:16:22Z

This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions.
After removing the duplicates, the new unique index should be applied without issues.

gijswijs

There are four tables with a foreign key reference to assets:

asset_witnesses
asset_proofs
passive_assets
addr_events

This PR addresses asset_witnesses but ignores the other three tables. Are we sure we can safely do so?

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

guggero · 2024-06-27T10:48:28Z

There are four tables with a foreign key reference to assets:
* `asset_witnesses`

* `asset_proofs`

* `passive_assets`

* `addr_events`
This PR addresses asset_witnesses but ignores the other three tables. Are we sure we can safely do so?

Yeah, good point. In the example database I was given there were no entries in the asset_proofs, passive_assets and addr_events table. So I think due to how the other insert/upsert queries are structured, only the asset with the lowest ID would get an entry in those tables. So by keeping the asset with the lowest ID when removing duplicates should work.
But I'll look into some more safe queries with the help of LLMs.

guggero · 2024-06-27T15:57:54Z

@gijswijs I added some more involved logic to properly account for what to do with the asset_witnesses and asset_proofs table. Those should now be dealt with correctly, I also added test code for that.

That leaves the passive_assets and addr_events tables. I thought hard about the code that produced the duplicates and am 95% sure that we won't ever have entries in those two tables for the duplicate entries. And if for some reason there would be, then for those users the migration would fail and we could manually intervene.

tapdb/migrations_test.go

ffranr · 2024-06-28T11:24:29Z

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

+                            LEFT JOIN asset_transfer_inputs ati
+                                      ON ati.anchor_point = mu.outpoint
+                   WHERE a.spent = false
+                     AND ati.input_id IS NOT NULL);


Please correct me if my understanding is wrong: if ati.input_id IS NOT NULL then the asset is spent? And those are the rows that we ensure are set to spent, is that correct?

Correct. We LEFT JOIN here, meaning that any field on ati is NULL if there is no matching spending transaction referencing the asset's on-chain outpoint.

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

gijswijs

Very nice!! This should fix a lot of the cases of duplication. I think we might run into one or two edge cases where manual intervention is still needed, but the majority (if not all) of it is now covered. I have some questions, a nit and one suggestion that I think will improve the test.

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

tapdb/testdata/migrations_test_00020_dummy_data.sql

jharveyb

Looks good, just some lingering Qs on query specifics and explanatory comments.

tapdb/sqlite.go

jharveyb · 2024-07-01T22:58:37Z

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

+UPDATE asset_proofs
+SET asset_id = filtered_mapping.new_asset_id
+FROM (                  
+        SELECT MIN(old_asset_id) AS old_asset_id, new_asset_id


What is MIN() doing here? Was a bit confused by the reuse of the names old_asset_id and new_asset_id between the tmp_asset_id_mapping table and this filtered_mapping table.

IIUC this subquery is selecting the older asset ID for asset proof entries that may have the same new_asset_id in the tmp_asset_id_mapping table?

So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id (which is the smaller asset ID), asset_proofs.asset_id would be updated to use that new_asset_id.

And any proof already referencing that new_asset_id should be unchanged.

asset_id in table asset_proofs has a UNIQUE constraint. So if we map old_asset_id a and b both to new_asset_id x we violate that constraint. So by grouping on new_asset_id and taking MIN(old_asset_id) we only map a single old_asset_id to the new_asset_id. (a to x in this case) The remaining asset_proof with foreign key b will be deleted in step 8.

The filtered_mapping table does the exact same thing as tmp_asset_id_mapping but taking into account the UNIQUE constraint on asset_id in table asset_proofs. Hence I reused the column names.

So the end result would be that, for duplicate assets, if the proof was not already using the new_asset_id (which is the smaller asset ID), asset_proofs.asset_id would be updated to use that new_asset_id.

Correct.

And any proof already referencing that new_asset_id should be unchanged.

It will get updated, but with the same values, so the result is unchanged.

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql

jharveyb

LGTM, great fix and should be safe 👍🏽

This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions. After removing the duplicates, the new unique index should be applied without issues.

guggero requested a review from gijswijs June 27, 2024 09:16

guggero force-pushed the unique-key-fix branch from 64a5ac9 to 156ba7e Compare June 27, 2024 09:27

gijswijs reviewed Jun 27, 2024

View reviewed changes

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql Show resolved Hide resolved

tapdb/sqlc/migrations/000020_asset_unique_key.up.sql Show resolved Hide resolved

dstadulis added this to the v0.4 milestone Jun 27, 2024

dstadulis assigned guggero Jun 27, 2024

dstadulis added database bug fix tech debt labels Jun 27, 2024

guggero force-pushed the unique-key-fix branch from 156ba7e to 32bc8d4 Compare June 27, 2024 15:53

guggero requested a review from ffranr June 27, 2024 15:58

ffranr approved these changes Jun 28, 2024

View reviewed changes

guggero force-pushed the unique-key-fix branch from 32bc8d4 to f3e2534 Compare June 28, 2024 12:51

gijswijs reviewed Jun 28, 2024

View reviewed changes

guggero force-pushed the unique-key-fix branch from f3e2534 to 165022c Compare June 28, 2024 15:59

gijswijs force-pushed the unique-key-fix branch from 165022c to ed3ac70 Compare July 1, 2024 11:34

gijswijs requested a review from jharveyb July 1, 2024 13:07

gijswijs self-assigned this Jul 1, 2024

jharveyb reviewed Jul 1, 2024

View reviewed changes

jharveyb self-requested a review July 2, 2024 15:45

jharveyb approved these changes Jul 2, 2024

View reviewed changes

tapdb: remove duplicate assets before adding unique index

4a543a3

This commit first removes duplicate assets that were mistakenly created due to self transfers in previous versions. After removing the duplicates, the new unique index should be applied without issues.

gijswijs force-pushed the unique-key-fix branch from 8466ac6 to 4a543a3 Compare July 2, 2024 18:34

Roasbeef enabled auto-merge July 2, 2024 18:43

Roasbeef added this pull request to the merge queue Jul 2, 2024

Merged via the queue into main with commit 2d8d484 Jul 2, 2024
16 checks passed

guggero deleted the unique-key-fix branch July 8, 2024 07:48

tapdb: remove duplicate assets before adding unique index #980

tapdb: remove duplicate assets before adding unique index #980

Uh oh!

Conversation

guggero commented Jun 27, 2024

Uh oh!

gijswijs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

guggero commented Jun 27, 2024

Uh oh!

guggero commented Jun 27, 2024

Uh oh!

Uh oh!

Uh oh!

ffranr Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

guggero Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gijswijs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jharveyb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jharveyb Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

gijswijs Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jharveyb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!