storage: fix AES-256-CTR-V2 JWK algorithm string typo#164311
storage: fix AES-256-CTR-V2 JWK algorithm string typo#164311xxmplus wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
Merging to
|
d6d6e85 to
c123ade
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
@sumeerbhola reviewed 3 files and made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on xxmplus).
pkg/storage/fs/key_manager.go line 394 at r1 (raw file):
prevActiveStoreKey, found := m.writeMu.mu.keyRegistry.StoreKeys[m.writeMu.mu.keyRegistry.ActiveStoreKeyId] if found && prevActiveStoreKey.KeyId == storeKeyInfo.KeyId &&
I like that we are fixing this immediately, instead of waiting for the active store key to be changed. It does mean some more testing (see other comments).
pkg/storage/fs/key_manager.go line 416 at r1 (raw file):
keyRegistry := makeRegistryProto() proto.Merge(keyRegistry, m.writeMu.mu.keyRegistry) keyRegistry.StoreKeys[storeKeyInfo.KeyId] = storeKeyInfo
we already have an entry for this KeyId, and we are replacing it with a new value. Just observing that this is a new behavior.
Also, the data keys have a ParentKeyID, so what we are doing is changing some information in the parent, for old data keys. I don't think this is a risk, since the ParentKeyID is not being used for observability/debugging. But worth double checking.
pkg/storage/fs/testdata/data_key_manager line 267 at r1 (raw file):
dir_migration 5 active-store-key migkey
This is a good start, but what makes me nervous is that it is not an end-to-end test. We also need a test that does this upgrade with a full encryptedFS which starts with an existing data key that is incorrectly 192 bit while having a parent key that was 256 bit but with the wrong algorithm string. Then write a few files with that. Then close and reopen the encryptedFS with this new code and ensure that despite the corrected value (we are replacing the value in the map in DataKeysRegistry.StoreKeys under the same keyid), the old files can be read fine and new ones can be created. The pre-new code state could be produced on master and dumped to testdata. This end-to-end test doesn't need to be a datadriven test.
pkg/storage/fs/key_manager_test.go line 316 at r1 (raw file):
d.ScanArgs(t, "id", &id) // Allow specifying the exact EncryptionType by name (e.g. // type=AES_256_CTR_V2), or fall back to the version shorthand.
we should change the existing test files and eliminate the version shorthand.
xxmplus
left a comment
There was a problem hiding this comment.
@xxmplus made 2 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola).
pkg/storage/fs/key_manager.go line 416 at r1 (raw file):
Previously, sumeerbhola wrote…
we already have an entry for this KeyId, and we are replacing it with a new value. Just observing that this is a new behavior.
Also, the data keys have a ParentKeyID, so what we are doing is changing some information in the parent, for old data keys. I don't think this is a risk, since the ParentKeyID is not being used for observability/debugging. But worth double checking.
Ack. Data keys store the parent store key ID as a string, and it is only used in debug_ear.go for grouping. Replacing the store key entry in DataKeysRegistry.StoreKeys under the same ID does not affect any data key's ParentKeyId field. The corrected EncryptionType is the right state for observability.
pkg/storage/fs/testdata/data_key_manager line 267 at r1 (raw file):
Previously, sumeerbhola wrote…
This is a good start, but what makes me nervous is that it is not an end-to-end test. We also need a test that does this upgrade with a full encryptedFS which starts with an existing data key that is incorrectly 192 bit while having a parent key that was 256 bit but with the wrong algorithm string. Then write a few files with that. Then close and reopen the encryptedFS with this new code and ensure that despite the corrected value (we are replacing the value in the map in DataKeysRegistry.StoreKeys under the same keyid), the old files can be read fine and new ones can be created. The pre-new code state could be produced on master and dumped to testdata. This end-to-end test doesn't need to be a datadriven test.
Ack. will add a test.
xxmplus
left a comment
There was a problem hiding this comment.
@xxmplus made 1 comment.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on sumeerbhola).
pkg/storage/fs/key_manager_test.go line 316 at r1 (raw file):
Previously, sumeerbhola wrote…
we should change the existing test files and eliminate the version shorthand.
Ack. will do.
c123ade to
bb4e907
Compare
A typo error in EncryptionType_AES_256_CTR_V2.JWKAlgorithm() caused it to return "cockroach-aes-192-ctr-v2" instead of "cockroach-aes-256-ctr-v2". This meant that V2 256-bit store keys were written to JWK files with the wrong algorithm string, and when loaded back via EncryptionTypeFromJWKAlgorithm(), they were interpreted as AES-192. As a result, data keys generated under these store keys were 24 bytes (192-bit) instead of the expected 32 bytes (256-bit). The fix is inspired by BastienClement's report in cockroachdb#160004 and has three parts: 1. Fix the typo in JWKAlgorithm() so newly written keys get the correct algorithm string. 2. Add a backward-compatibility cross-check in LoadKeyFromFile: after parsing a JWK file, if the algorithm string maps to AES_192_CTR_V2 but the actual key material is 32 bytes, correct the EncryptionType to AES_256_CTR_V2. This ensures existing JWK files written with the buggy algorithm string continue to work after upgrade without manual intervention. 3. Fix SetActiveStoreKeyInfo to detect EncryptionType changes for the same key ID. Previously the early-return check only compared KeyId, so when the load-time correction changed the type from 192 to 256, the function would short-circuit and never rotate the data key to the correct size. The fix adds EncryptionType to the comparison and relaxes the inactive-key reuse guard to allow the current active key to be re-registered with a corrected type. Co-authored-by: BastienClement <BastienClement@users.noreply.github.com> Co-authored-by: Claude Code <noreply@anthropic.com> Fixes cockroachdb#160004 Release note (bug fix): Fixed a bug where store encryption keys generated with `cockroach gen encryption-key --version=2 --size=256` were written with an incorrect JWK algorithm string, causing them to be misclassified as AES-192-CTR-V2 on load. This resulted in data keys being generated at 24 bytes (AES-192) instead of the intended 32 bytes (AES-256). Existing key files with the wrong algorithm string are now automatically corrected on load, and data keys are rotated to the correct size on the next node startup. Epic: None
bb4e907 to
0f90696
Compare
A typo error in EncryptionType_AES_256_CTR_V2.JWKAlgorithm() caused it to return "cockroach-aes-192-ctr-v2" instead of "cockroach-aes-256-ctr-v2". This meant that V2 256-bit store keys were written to JWK files with the wrong algorithm string, and when loaded back via EncryptionTypeFromJWKAlgorithm(), they were interpreted as AES-192. As a result, data keys generated under these store keys were 24 bytes (192-bit) instead of the expected 32 bytes (256-bit).
The fix is inspired by BastienClement's report in #160004 and has three parts:
Fix the typo in JWKAlgorithm() so newly written keys get the correct algorithm string.
Add a backward-compatibility cross-check in LoadKeyFromFile: after parsing a JWK file, if the algorithm string maps to AES_192_CTR_V2 but the actual key material is 32 bytes, correct the EncryptionType to AES_256_CTR_V2. This ensures existing JWK files written with the buggy algorithm string continue to work after upgrade without manual intervention.
Fix SetActiveStoreKeyInfo to detect EncryptionType changes for the same key ID. Previously the early-return check only compared KeyId, so when the load-time correction changed the type from 192 to 256, the function would short-circuit and never rotate the data key to the correct size. The fix adds EncryptionType to the comparison and relaxes the inactive-key reuse guard to allow the current active key to be re-registered with a corrected type.
Tests added:
TestJWKAlgorithmRoundtrip: verifies JWKAlgorithm() and EncryptionTypeFromJWKAlgorithm() are inverses for all encryption types, directly catching the original typo.
TestGenEncryptionKey: extended to verify that the EncryptionType loaded back from a generated key file matches the requested size, exercising the full write-then-read path.
TestDataKeyManager: new data-driven test section simulating the upgrade path where a store key is re-registered with a corrected EncryptionType. Confirms that the fix is backward compatible and seamless: the type correction triggers a data key rotation with the correct type, old data keys remain in the registry and accessible via GetKey (so existing encrypted data can still be decrypted), and no errors occur during the transition.
Co-authored-by: @BastienClement BastienClement@users.noreply.github.com
Co-authored-by: Claude Code noreply@anthropic.com
Fixes #160004
Release note: None
Epic: None