-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix 2 bugs in non-raw send with encryption #17340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
How can I acknowledge @HankB and @pcd1193182 for the significant effort they put into this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Though after searching around this area for couple hours I still have no idea why it helps with the corruptions. I think it is only a trigger, but for what?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixes here make sense. But I agree, it's not at all clear to me how this would lead to a corruption. Was the final reproducer for this something which could be reasonably adapted and included in the test suite?
Bisecting identified the redacted send/receive as the source of the bug for issue openzfs#12014. Specifically the call to dsl_dataset_hold_obj(&fromds) had been replaced by dsl_dataset_hold_obj_flags() which would pass a DECRYPT flag and create a key mapping. The problem here arises when trying to access the MASTER_NODE_OBJ that key mapping is used and the code panics. Fix this by restoring the call to dsl_dataset_hold_obj(). Later on in dmu_send_obj() replace dsl_dataset_rele(&to_ds) with dsl_dataset_rele_flags(), otherwise the created key mapping (on to_ds) earlier on is leaked, which result in a panic when exporting the sending pool or unloading the zfs module after a non-raw send from an encrypted filesystem. Contributions-by: Hank Barta <[email protected]> Contributions-by: Paul Dagnelie <[email protected]> Signed-off-by: George Amanakis <[email protected]>
I have a WIP CodeQL check for instances of this problem in my git repository: https://github.com/ryao/zfs/tree/issue-12014 I am waiting to see if it works. If it does, we should think about adding checks for other functions that could be accidentally mismatched. |
ryao/zfs@7c921e3 works and detected the two issues fixed in this patch. The current iteration is limited to cases where the mismatched hold and release functions are called from the same function. I might try to handle more cases before I open a PR with the check. |
I just did a manual search to see if this is even necessary. It turns out that it is. |
@ryao thank you for taking this to the next level. |
Motivation and Context
Closes #12014
Description
Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) had been replaced by
dsl_dataset_hold_obj_flags() which would pass a DECRYPT flag and create
a key mapping. The problem here arises when trying to access the
MASTER_NODE_OBJ that key mapping is used and the code panics. Fix this
by restoring the call to dsl_dataset_hold_obj().
Later on in dmu_send_obj() replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags(), otherwise the created key mapping (on to_ds)
earlier on is leaked, which result in a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.
How Has This Been Tested?
Manually running the scripts https://github.com/HankB/provoke_ZFS_corruption
Types of changes
Checklist:
Signed-off-by
.