Skip to content

rekey all cosmos data from .id=UUID to .id=resourceID#3651

Merged
openshift-merge-bot[bot] merged 8 commits intomainfrom
cs-76-standardize-metadata
Jan 15, 2026
Merged

rekey all cosmos data from .id=UUID to .id=resourceID#3651
openshift-merge-bot[bot] merged 8 commits intomainfrom
cs-76-standardize-metadata

Conversation

@deads2k
Copy link
Collaborator

@deads2k deads2k commented Dec 18, 2025

This change performs a data migration on read to switch from reading based on a cosmos search of fields matching a pattern to direct item reads based on key. On a failure to lookup by cosmosID and success reading based on existing search, we create a new item and delete the original.

Since old frontends use the cosmosUID as fully opaque, the new ID doesn't cause problems for old frontends even on rollback.

When the frontend starts, we find all subscriptions, then search for all resources in each subscription using the untyped client listRecursive, then read each item to trigger the migration.

If the create succeeds and the delete fails, then resulting GETs will produce an ambiguous result error (a pre-existing handled error, so old frontends are ok) and we'll have to resolve the conflict manually.

Still needs more test coverage for types, but since we merged #3662 we can see this works properly for clusters and operations. Need to add coverage (in a separate PR first to prove it works) for nodepools and externalauths.

@deads2k
Copy link
Collaborator Author

deads2k commented Jan 7, 2026

/retest

@deads2k deads2k force-pushed the cs-76-standardize-metadata branch 4 times, most recently from c04ac35 to 2ac4245 Compare January 9, 2026 17:03
@deads2k
Copy link
Collaborator Author

deads2k commented Jan 9, 2026

/retest

@deads2k deads2k force-pushed the cs-76-standardize-metadata branch 4 times, most recently from 7b3a225 to 9e0d385 Compare January 13, 2026 00:33
@deads2k
Copy link
Collaborator Author

deads2k commented Jan 13, 2026

ha! it worked!

@deads2k deads2k force-pushed the cs-76-standardize-metadata branch from 9e0d385 to 906d49f Compare January 13, 2026 18:37
@deads2k deads2k changed the title [wip] rekey all cosmos data from .id=UUID to .id=resourceID rekey all cosmos data from .id=UUID to .id=resourceID Jan 13, 2026
@deads2k deads2k force-pushed the cs-76-standardize-metadata branch from 906d49f to 7173fff Compare January 13, 2026 21:21
@deads2k
Copy link
Collaborator Author

deads2k commented Jan 13, 2026

/retest

@deads2k
Copy link
Collaborator Author

deads2k commented Jan 13, 2026

setup

/retest

}
_, err = transaction.Execute(ctx, nil)
if err != nil {
logger.Error("failed executing transaction", "transaction", transaction)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it useful to emit the error here in the log as well, or we'll have it tracked as part of utils.TrackError()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it useful to emit the error here in the log as well, or we'll have it tracked as part of utils.TrackError()

Doing it here allows us to include the entire transaction in the structured output.

Copy link
Member

@bennerv bennerv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
/lgtm

Took a few passes and it makes sense to me. Had a couple open questions, but feel free to remove the hold so it merges.

Comment on lines +175 to +179
for _, subscription := range subscriptionIterator.Items(ctx) {
if _, err := cosmosClient.Subscriptions().Get(ctx, subscription.ResourceID.Name); err != nil {
panic(err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use subscription.ResourceID.Name over the subscription.ResourceID.SubscriptionID for keying into cosmosdb? Fundamentally there's no difference, but wondering if there's a specific reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistency when I wrote it.

LastUpdated int `json:"-"`

// CosmosUID is used to keep track of whether we have transitioned to a new cosmosUID scheme for this item
CosmosUID string `json:"-"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not marshal this, but do so for operation types and other types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not marshal this, but do so for operation types and other types?

I think you're right. Sounds like via slack I can take this an IOU to clear the stack and deliver it.

}

// some pieces of data conflict with standard fields. We may evolve over time, but for now avoid persisting those.
cosmosObj.InternalState.CosmosUID = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unnecessary since the field using the json tag: json:"-" for the subscription CosmosUID field?

@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bennerv, deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deads2k
Copy link
Collaborator Author

deads2k commented Jan 15, 2026

I agree to the IOU

/hold cancel

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD dc010fa and 2 for PR HEAD 4e54fcd in total

@deads2k
Copy link
Collaborator Author

deads2k commented Jan 15, 2026

/retest

This change performs a data migration on read to switch from reading
based on a cosmos search of fields matching a pattern to direct item
reads based on key.  On a failure to lookup by cosmosID and success
reading based on existing search, we create a new item and delete the
original.

Since old frontends use the cosmosUID as fully opaque, the new ID
doesn't cause problems for old frontends even on rollback.

When the frontend starts, we find all subscriptions, then search for all
resources in each subscription using the untyped client listRecursive,
then read each item to trigger the migration.
The iterator should remain cosmosID (unless we decide to redefine as
resourceID), but for now keeping that meaning the same.
This will allow debugger errors like "(wrapped at cluster.go:674)
transaction step 1 of 3 failed with 404 Not Found".
@deads2k deads2k force-pushed the cs-76-standardize-metadata branch from 4e54fcd to 9234f40 Compare January 15, 2026 16:26
@openshift-ci openshift-ci bot removed the lgtm label Jan 15, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2026

New changes are detected. LGTM label has been removed.

@deads2k deads2k added the lgtm label Jan 15, 2026
@deads2k
Copy link
Collaborator Author

deads2k commented Jan 15, 2026

simple rebase/unit test tweak to #3218

@deads2k
Copy link
Collaborator Author

deads2k commented Jan 15, 2026

2025 seed

 --- FAIL: TestRoundTripInternalExternalInternal (0.05s)
    conversion_fuzz_test.go:36: seed: 5851082132638448021
    conversion_fuzz_test.go:113: Original: { 

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 5bd1c5b into main Jan 15, 2026
24 checks passed
@openshift-merge-bot openshift-merge-bot bot deleted the cs-76-standardize-metadata branch January 15, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants