Skip to content

storage: fix AES-256-CTR-V2 JWK algorithm string typo#164311

Open
xxmplus wants to merge 1 commit intocockroachdb:masterfrom
xxmplus:fix-enc-key-160004
Open

storage: fix AES-256-CTR-V2 JWK algorithm string typo#164311
xxmplus wants to merge 1 commit intocockroachdb:masterfrom
xxmplus:fix-enc-key-160004

Conversation

@xxmplus
Copy link
Contributor

@xxmplus xxmplus commented Feb 24, 2026

A typo error in EncryptionType_AES_256_CTR_V2.JWKAlgorithm() caused it to return "cockroach-aes-192-ctr-v2" instead of "cockroach-aes-256-ctr-v2". This meant that V2 256-bit store keys were written to JWK files with the wrong algorithm string, and when loaded back via EncryptionTypeFromJWKAlgorithm(), they were interpreted as AES-192. As a result, data keys generated under these store keys were 24 bytes (192-bit) instead of the expected 32 bytes (256-bit).

The fix is inspired by BastienClement's report in #160004 and has three parts:

  1. Fix the typo in JWKAlgorithm() so newly written keys get the correct algorithm string.

  2. Add a backward-compatibility cross-check in LoadKeyFromFile: after parsing a JWK file, if the algorithm string maps to AES_192_CTR_V2 but the actual key material is 32 bytes, correct the EncryptionType to AES_256_CTR_V2. This ensures existing JWK files written with the buggy algorithm string continue to work after upgrade without manual intervention.

  3. Fix SetActiveStoreKeyInfo to detect EncryptionType changes for the same key ID. Previously the early-return check only compared KeyId, so when the load-time correction changed the type from 192 to 256, the function would short-circuit and never rotate the data key to the correct size. The fix adds EncryptionType to the comparison and relaxes the inactive-key reuse guard to allow the current active key to be re-registered with a corrected type.

Tests added:

  • TestJWKAlgorithmRoundtrip: verifies JWKAlgorithm() and EncryptionTypeFromJWKAlgorithm() are inverses for all encryption types, directly catching the original typo.

  • TestGenEncryptionKey: extended to verify that the EncryptionType loaded back from a generated key file matches the requested size, exercising the full write-then-read path.

  • TestDataKeyManager: new data-driven test section simulating the upgrade path where a store key is re-registered with a corrected EncryptionType. Confirms that the fix is backward compatible and seamless: the type correction triggers a data key rotation with the correct type, old data keys remain in the registry and accessible via GetKey (so existing encrypted data can still be decrypted), and no errors occur during the transition.

Co-authored-by: @BastienClement BastienClement@users.noreply.github.com
Co-authored-by: Claude Code noreply@anthropic.com

Fixes #160004

Release note: None
Epic: None

@xxmplus xxmplus requested review from a team as code owners February 24, 2026 23:05
@xxmplus xxmplus requested a review from sumeerbhola February 24, 2026 23:05
@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 24, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xxmplus xxmplus force-pushed the fix-enc-key-160004 branch 2 times, most recently from d6d6e85 to c123ade Compare February 24, 2026 23:25
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola reviewed 3 files and made 4 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on xxmplus).


pkg/storage/fs/key_manager.go line 394 at r1 (raw file):

	prevActiveStoreKey, found :=
		m.writeMu.mu.keyRegistry.StoreKeys[m.writeMu.mu.keyRegistry.ActiveStoreKeyId]
	if found && prevActiveStoreKey.KeyId == storeKeyInfo.KeyId &&

I like that we are fixing this immediately, instead of waiting for the active store key to be changed. It does mean some more testing (see other comments).


pkg/storage/fs/key_manager.go line 416 at r1 (raw file):

	keyRegistry := makeRegistryProto()
	proto.Merge(keyRegistry, m.writeMu.mu.keyRegistry)
	keyRegistry.StoreKeys[storeKeyInfo.KeyId] = storeKeyInfo

we already have an entry for this KeyId, and we are replacing it with a new value. Just observing that this is a new behavior.
Also, the data keys have a ParentKeyID, so what we are doing is changing some information in the parent, for old data keys. I don't think this is a risk, since the ParentKeyID is not being used for observability/debugging. But worth double checking.


pkg/storage/fs/testdata/data_key_manager line 267 at r1 (raw file):

dir_migration
5
active-store-key migkey

This is a good start, but what makes me nervous is that it is not an end-to-end test. We also need a test that does this upgrade with a full encryptedFS which starts with an existing data key that is incorrectly 192 bit while having a parent key that was 256 bit but with the wrong algorithm string. Then write a few files with that. Then close and reopen the encryptedFS with this new code and ensure that despite the corrected value (we are replacing the value in the map in DataKeysRegistry.StoreKeys under the same keyid), the old files can be read fine and new ones can be created. The pre-new code state could be produced on master and dumped to testdata. This end-to-end test doesn't need to be a datadriven test.


pkg/storage/fs/key_manager_test.go line 316 at r1 (raw file):

				d.ScanArgs(t, "id", &id)
				// Allow specifying the exact EncryptionType by name (e.g.
				// type=AES_256_CTR_V2), or fall back to the version shorthand.

we should change the existing test files and eliminate the version shorthand.

Copy link
Contributor Author

@xxmplus xxmplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xxmplus made 2 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola).


pkg/storage/fs/key_manager.go line 416 at r1 (raw file):

Previously, sumeerbhola wrote…

we already have an entry for this KeyId, and we are replacing it with a new value. Just observing that this is a new behavior.
Also, the data keys have a ParentKeyID, so what we are doing is changing some information in the parent, for old data keys. I don't think this is a risk, since the ParentKeyID is not being used for observability/debugging. But worth double checking.

Ack. Data keys store the parent store key ID as a string, and it is only used in debug_ear.go for grouping. Replacing the store key entry in DataKeysRegistry.StoreKeys under the same ID does not affect any data key's ParentKeyId field. The corrected EncryptionType is the right state for observability.


pkg/storage/fs/testdata/data_key_manager line 267 at r1 (raw file):

Previously, sumeerbhola wrote…

This is a good start, but what makes me nervous is that it is not an end-to-end test. We also need a test that does this upgrade with a full encryptedFS which starts with an existing data key that is incorrectly 192 bit while having a parent key that was 256 bit but with the wrong algorithm string. Then write a few files with that. Then close and reopen the encryptedFS with this new code and ensure that despite the corrected value (we are replacing the value in the map in DataKeysRegistry.StoreKeys under the same keyid), the old files can be read fine and new ones can be created. The pre-new code state could be produced on master and dumped to testdata. This end-to-end test doesn't need to be a datadriven test.

Ack. will add a test.

Copy link
Contributor Author

@xxmplus xxmplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xxmplus made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola).


pkg/storage/fs/key_manager_test.go line 316 at r1 (raw file):

Previously, sumeerbhola wrote…

we should change the existing test files and eliminate the version shorthand.

Ack. will do.

@xxmplus xxmplus force-pushed the fix-enc-key-160004 branch from c123ade to bb4e907 Compare March 4, 2026 01:45
A typo error in EncryptionType_AES_256_CTR_V2.JWKAlgorithm() caused
it to return "cockroach-aes-192-ctr-v2" instead of
"cockroach-aes-256-ctr-v2". This meant that V2 256-bit store keys
were written to JWK files with the wrong algorithm string, and when
loaded back via EncryptionTypeFromJWKAlgorithm(), they were
interpreted as AES-192. As a result, data keys generated under these
store keys were 24 bytes (192-bit) instead of the expected 32 bytes
(256-bit).

The fix is inspired by BastienClement's report in cockroachdb#160004 and has
three parts:

1. Fix the typo in JWKAlgorithm() so newly written keys get the
   correct algorithm string.

2. Add a backward-compatibility cross-check in LoadKeyFromFile:
   after parsing a JWK file, if the algorithm string maps to
   AES_192_CTR_V2 but the actual key material is 32 bytes, correct
   the EncryptionType to AES_256_CTR_V2. This ensures existing JWK
   files written with the buggy algorithm string continue to work
   after upgrade without manual intervention.

3. Fix SetActiveStoreKeyInfo to detect EncryptionType changes for
   the same key ID. Previously the early-return check only compared
   KeyId, so when the load-time correction changed the type from
   192 to 256, the function would short-circuit and never rotate
   the data key to the correct size. The fix adds EncryptionType to
   the comparison and relaxes the inactive-key reuse guard to allow
   the current active key to be re-registered with a corrected type.

Co-authored-by: BastienClement <BastienClement@users.noreply.github.com>
Co-authored-by: Claude Code <noreply@anthropic.com>

Fixes cockroachdb#160004

Release note (bug fix): Fixed a bug where store encryption keys
generated with `cockroach gen encryption-key --version=2 --size=256`
were written with an incorrect JWK algorithm string, causing them to
be misclassified as AES-192-CTR-V2 on load. This resulted in data
keys being generated at 24 bytes (AES-192) instead of the intended
32 bytes (AES-256). Existing key files with the wrong algorithm
string are now automatically corrected on load, and data keys are
rotated to the correct size on the next node startup.

Epic: None
@xxmplus xxmplus force-pushed the fix-enc-key-160004 branch from bb4e907 to 0f90696 Compare March 4, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli: gen encryption-key --size 256 --version 2 produces incorrect "alg" string

3 participants