Skip to content

CA-424844: Fix incorrect vdi update in SXM#6949

Draft
changlei-li wants to merge 1 commit intoxapi-project:masterfrom
changlei-li:private/changleli/ca-424844
Draft

CA-424844: Fix incorrect vdi update in SXM#6949
changlei-li wants to merge 1 commit intoxapi-project:masterfrom
changlei-li:private/changleli/ca-424844

Conversation

@changlei-li
Copy link
Contributor

In the case of migration, update_snapshot_info_dest has handled the snapshot_info correctly. Then in vm import, if the snapshot_of is set to ref null when not found in table, the info is missing.
Original PR: #6913

In the case of migration, `update_snapshot_info_dest` has
handled the snapshot_info correctly. Then in vm import, if
the snapshot_of is set to ref null when not found in table,
the info is missing.

Signed-off-by: Changlei Li <[email protected]>
@changlei-li changlei-li requested a review from contificate March 16, 2026 08:52
@changlei-li changlei-li marked this pull request as draft March 16, 2026 08:53
@changlei-li
Copy link
Contributor Author

Hi @contificate Another fail is raised in the 26.6.0 test. This is a case to check the snapshot_of after migration. The update_vdi_links is new added. Let's discuss if we should fix the change or just revert it. Thank you.
This PR is a draft that can pass the case.
I want to know why we need #6913 as I didn't see complains before it.

This is the test log:

[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vbd-list -s 10.62.97.10 -u root -pw '************' params="vdi-uuid" type=Disk vm-uuid=9c927d88-942a-3bdd-f27e-148227aebe51  --minimal (10.62.97.10 = cl15-10)
5dc67e6a-2319-4117-bfa7-7c9318328d05,0b6b0681-fa35-45fa-8cbf-115421df8b3e,c30e05d2-5689-4966-9ff2-8ef6c45a1801,216d46c8-06c9-472e-b87d-8fafa3eee960
[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vdi-param-get uuid=5dc67e6a-2319-4117-bfa7-7c9318328d05 param-name=name-label -s 10.62.97.10 -u root -pw '************' --minimal (10.62.97.10 = cl15-10)
0
[VERBOSE] [2026-03-11 23:49:01 UTC] After SXM new snapshot VDI we got is 5dc67e6a-2319-4117-bfa7-7c9318328d05
[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vdi-param-get uuid=0b6b0681-fa35-45fa-8cbf-115421df8b3e param-name=name-label -s 10.62.97.10 -u root -pw '************' --minimal (10.62.97.10 = cl15-10)
3
[VERBOSE] [2026-03-11 23:49:01 UTC] After SXM new snapshot VDI we got is 0b6b0681-fa35-45fa-8cbf-115421df8b3e
[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vdi-param-get uuid=c30e05d2-5689-4966-9ff2-8ef6c45a1801 param-name=name-label -s 10.62.97.10 -u root -pw '************' --minimal (10.62.97.10 = cl15-10)
2
[VERBOSE] [2026-03-11 23:49:01 UTC] After SXM new snapshot VDI we got is c30e05d2-5689-4966-9ff2-8ef6c45a1801
[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vdi-param-get uuid=216d46c8-06c9-472e-b87d-8fafa3eee960 param-name=name-label -s 10.62.97.10 -u root -pw '************' --minimal (10.62.97.10 = cl15-10)
1
[VERBOSE] [2026-03-11 23:49:01 UTC] After SXM new snapshot VDI we got is 216d46c8-06c9-472e-b87d-8fafa3eee960
[VERBOSE] [2026-03-11 23:49:01 UTC] Local command: /local/scratch/tmp/4585297-xenrt6rfdv990/xe vdi-param-get uuid=5dc67e6a-2319-4117-bfa7-7c9318328d05 param-name=snapshot-of -s 10.62.97.10 -u root -pw '************' --minimal (10.62.97.10 = cl15-10)
<not in database>
[VERBOSE] [2026-03-11 23:49:01 UTC] After SXM: snapshot-of link we got for snapshot VDI 5dc67e6a-2319-4117-bfa7-7c9318328d05 link is <not in database>

@changlei-li changlei-li requested a review from psafont March 16, 2026 09:02
@contificate
Copy link
Contributor

I want to know why we need #6913 as I didn't see complains before it.

We keep seeing inconsistent snapshot_of entries in customer setups.

  1. A VDI may have snapshot_of as non-NULL, but its value is like Ref:123 (a pseudo-reference notation only used during export, i.e. Ref:123 will refer to the object with uuid=123 in ova.xml). The import/export code is all very manual and doesn't have any real correctness baked in, so there's lots of ad-hoc fixup logic.
  2. A VDI may have is_a_snapshot=true but have a NULL snapshot_of reference.

@psafont
Copy link
Member

psafont commented Mar 16, 2026

As context to explain why #6913 was done, I've analised customers' databases with VMs that invalid snapshot_of fields. These were in the form of Ref:13, which are generated from imports. The PR was an attempt to resolve these.

@contificate
Copy link
Contributor

I'm not familiar with SXM, so I don't immediately see how these features are related. What is the related test doing that created this CA ticket? SXM followed by an import, or is non-VM metadata exported as part of the migration?

The lack of guarantees and ad-hoc logic around non-VM metadata exports is quite worrying if it's used for import, pool join, etc. with different expectations before and after.

Overall, I'd like to rule out <not in database> by construction. I'd need to do a lot more investigation of everywhere this is used and in what contexts to see what cases we really need to deal with.

@psafont
Copy link
Member

psafont commented Mar 16, 2026

It would help to understand the case and expectations that failed:

Importing a VDI with is_snapshot = true, and snapshot_of references an invalid "import ID":

  • Before: Thesnapshot_of is converted to Ref.null. The VDI's is_snapshot field is changed to false.

  • After: The snapshot_of is left as-is. The is_snapshot field is left as is.

The After allows the invalid state that Colin's PR was meant to prevent.

The other cases:

  • is_snapshot = true, snapshot_of is null: is_snapshot changes to false in before and after
  • is_snapshot = true, snapshot_of is a vaaid import ID: is_snapshot remains true, snapshot_of gets assigned, in before and after
  • is_snapshot = false, snapshot_of is null: unchanged
  • is_snapshot = false, snapshot_of is invalid ID: unchanged
  • is_snapshot = false, snapshot_of is a valid ID: is_snapshot gets changed to true and snapshot_of is assigned, in before and after.

One thing I'm not clear about is how is the null value serialized in the export code to make it different from a referential, numerical id that's invalid

@changlei-li
Copy link
Contributor Author

OK. Thank you. Let's aimed to keep the change. I am working to recognize the prerequisite of the regression case. The regression test failed 100%. But I also find the snapshot_of is OK in my manual SXM test.

I think the update_snapshot_info_dest step in SXM may affect the update_vdi_links in import_metadata. Maybe it changes the snapshot info then it can't be found in state.table.

The update_snapshot_info_dest log:

2026-03-11T23:48:00.143241+00:00 cl15-10 xapi: [ info||2145 :::80|VM.migrate_send R:fdacc4838547|mux] SR.update_snapshot_info_dest dbg:VM.migrate_send R:fdacc4838547 sr:627dfe8a-8ae0-e360-f6b7-e75628eea1bc vdi:ca5431bb-689d-436c-be6d-bffb14901c6d ~src_vdi:78378d7d-7fbf-43ed-a8e0-9cfa716fc8db snapshot_pairs:[local:5dc67e6a-2319-4117-bfa7-7c9318328d05, src:97a22854-087b-4643-829c-2027d7a75842]

@changlei-li
Copy link
Contributor Author

This fail can be reproduced in cross-pool SXM for VM with snapshot. After the SXM, the snapshot_of data in vdi is missing.

I add some debug log:

2026-03-17T07:09:45.754926+00:00 genus-35-104d xapi: [ info||2105 :::80|SR.update_snapshot_info_dest D:88fedd3bc42d|mux] update_snapshot_info_dest: Setting snapshot_of for VDI d3538bf7-a11f-48e7-85b5-463baa2b53f5 from OpaqueRef:NULL to 035d2eda-649a-42f7-96b3-4a4a774700ce
...
2026-03-17T07:09:54.103120+00:00 genus-35-104d xapi: [debug||2150 HTTPS 10.63.97.58->:::80|VM metadata import R:a339401949e2|import] update_vdi_links: Processing VDI OpaqueRef:a43cdfef-4eef-a9e7-b8a9-54a5ffa8587a (uuid=d3538bf7-a11f-48e7-85b5-463baa2b53f5)
2026-03-17T07:09:54.103146+00:00 genus-35-104d xapi: [debug||2150 HTTPS 10.63.97.58->:::80|VM metadata import R:a339401949e2|import]   Current parent=OpaqueRef:NULL, snapshot_of=OpaqueRef:c0541bcf-2a0b-3079-f3e3-871511d594bc
2026-03-17T07:09:54.103177+00:00 genus-35-104d xapi: [debug||2150 HTTPS 10.63.97.58->:::80|VM metadata import R:a339401949e2|import]   Parent OpaqueRef:NULL not in state.table
2026-03-17T07:09:54.103204+00:00 genus-35-104d xapi: [debug||2150 HTTPS 10.63.97.58->:::80|VM metadata import R:a339401949e2|import]   snapshot_of OpaqueRef:c0541bcf-2a0b-3079-f3e3-871511d594bc not in state.table

@contificate
Copy link
Contributor

Can you describe a bit more about how these two features relate? Does SXM use exportation of non-VM metadata to migrate VDI objects? If so, it's worth noting that the configuration for sending VDIs can specify "include_snapshots" (or something similarly named) but it only traverses immediate snapshots, so if there's a chain of them, they're not expected to be imported elsewhere. I'm trying to work out what's semantically different from before.

@changlei-li
Copy link
Contributor Author

Hi @contificate I don't fully understand this now. But I think you can easily reproduce it by cross-pool SXM for a VM with snapshot.

@contificate
Copy link
Contributor

I'd support reverting it if it's too problematic right now. I must confess I'm not sure I have the required test suites at my fingertips to dig into this. I appreciate your work on this.

@changlei-li
Copy link
Contributor Author

OK. Let me revert them first to resolve the regression first.

@changlei-li
Copy link
Contributor Author

Another two points:

  1. Is there other case like cross-pool SXM will touch this?
  2. update_vm_links has a similar implementation. How is it handled in different cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants