zdb: better handling for corrupt block pointers#17166
zdb: better handling for corrupt block pointers#17166behlendorf merged 5 commits intoopenzfs:masterfrom
Conversation
amotin
left a comment
There was a problem hiding this comment.
I agree that assertions are not good for the data we read from disks and that might get corrupted, but I am not sure just complaining to stderr is a good solution either. Error messages printed to stderr might be difficult to correlate to the output printed to stdout. Also since we use zdb as a pool verification tool in some CI tests, I am not sure spamming stderr without returning actual error status would be noticed.
So what would you prefer? I could print the messages to stdout instead, and eventually exit with status 1. |
|
Exit code would definitely be good. How to better print it I am not sure though. I haven't looked close on that part of the code. But I already saw |
|
The latest commit prints all of the errors to stdout instead of stderr. It wasn't possible to print the L1 block's fill error on the same line as the block pointer itself, at least not without buffering potentially massive amounts of text in RAM, so it's printed afterwards. The output looks like this: |
amotin
left a comment
There was a problem hiding this comment.
It is quite verbose. I wonder what happen with some code that may try to parse the output (I think we have some in ZTS), and without error status returned still it might be difficult to handle properly.
I could change the exit status to 1 if any corruption is found. However, there is precedent for not doing that. zdb will already report corruption and yet exit 0 in the following conditions:
So for consistency's sake, I think I should leave the exit status unchanged. |
|
As discussed in the OpenZFS leadership meeting today, we agreed to change zdb to always exit non-zero if it detects any corruption. I'll create a distinct error code for that. |
|
@amotin @allanjude with the latest commits, zdb will exit 3 if on-disk corruption was detected. It will also follow the usual convention of exiting 2 due to invalid CLI usage. |
amotin
left a comment
There was a problem hiding this comment.
Just randomly looking around the handled cases I found couple more.
|
ping @amotin could you please review again? |
amotin
left a comment
There was a problem hiding this comment.
OK. So be it. But please rebase it to the latest master so we have clean(er) CI tests.
When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program. Sponsored by: ConnectWise Signed-off-by: Alan Somers <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
0d2eb64 to
11ab2cd
Compare
|
@asomers I manually rebased this PR to get an updated CI run. |
When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program. When corruption is detected zdb will exit with an error code of 3. Sponsored by: ConnectWise Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#17166
When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program. When corruption is detected zdb will exit with an error code of 3. Sponsored by: ConnectWise Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#17166
When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program.
Sponsored by: ConnectWise
Signed-off-by: Alan Somers [email protected]
Motivation and Context
When trying to print a file's indirect blocks with "zfs -vvbbbb", if that file contains corrupt block pointers zdb will fail assertions and crash.
Description
Remove three assertions, replacing them with warnings printed to stderr. The rest of zdb does not require those block pointers to be intact.
How Has This Been Tested?
Tested with a dataset that had become corrupted, similarly to #17077 .
Types of changes
Checklist:
Signed-off-by.