-
Notifications
You must be signed in to change notification settings - Fork 7
Downgrade metadata size-related errors into warnings #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not entirely sure yet if I like the approach that this PR uses, which is why I've marked this as a draft for the time being. The approach I took here was to collect parse-related warnings during parsing and then print them all at the end, which lets us avoid needing to change any parts of the public-facing API. Some alternative designs:
Thoughts? |
Since this is a library, I think the application should be able to decide if they want to handle the warnings in a different manner (so definitely not option 1: print as encountered). What if we had a new API function that returned Also, these are warnings, but they should probably go to stderr and not stdout just in case they occur in an application piping output to something expecting to process that output. Other than that, I like this, and thank you for grinding this axe. |
I like that idea, although it wouldn't just be a single function. There are currently four functions in |
I think I would be surprised if I bumped the submodule, had everything still compile, and then got new content on |
I think this is the central question here. Who are the downstream applications, and what are their concerns? The main ones I can think of are Crux and SAW. These are tools used for verification, and their soundness is important to their users. How confident are we that the metadata doesn't change the semantics? I'm personally not confident, not having reviewed the documentation for various metadata in a long time. If any of us is confident (or is willing to do the research to become confident), then I'm aligned with this change. |
I don't think the question is "can the metadata change the semantics" but rather "can adding new metadata fields change the semantics". The former question can usually be answered with a "no" (LLVM even makes it possible to strip unused LLVM metadata as an optimization pass), but there are a few places where tools like SAW makes use of metadata information (e.g., for determining the field names of C structs). That being said, LLVM preserves backwards compatibility whenever it adds new fields to metadata records, so any changes to LLVM metadata are usually additive. As such, I'd deem it quite safe to ignore newly added metadata fields, as they wouldn't impact other parts of the metadata that already exist. |
Great! I'm aligned, then. That being said, I hope that downgrading this into a warning doesn't discourage projects that make use of |
We include the repo's URL in the warning message, so my hope is that the warnings prove annoying enough that our users will be motivated to file issues anyway :) |
Previously, `llvm-pretty-bc-parser` would produce a fatal error if it encountered a metadata record with an unexpected size. This proves to be extremely cumbersome in practice, however, as LLVM frequently adds new fields to metadata records, and this causes a fair bit of headaches when attempting to support newer LLVM versions. This patch downgrades this class of errors into warnings. It does so by: * Introducing a new `ParseWarning` data type that captures all types of parser-related warnings. (For now, the only type of `ParseWarning` is `InvalidMetadataRecordSize`, but we may add more in the future.) * Adding new functions to `Data.LLVM.BitCode` (all of which end with `*WithWarnings`) that return `ParseWarning`s alongside the parsed `Module`. As such, this patch does not change any of the existing `Data.LLVM.BitCode` API. It is up to users to decide if they want to opt into the new API that offers `ParseWarning`s. See the changes in the `disasm-test` and `llvm-disasm` test suite for examples of how to use the new API. Fixes #248.
8c71a0b
to
881abab
Compare
Based on feedback, I've kept the implementation of the existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of going ahead and deprecating the APIs that don't return warnings. IMO, callers should be explicit about not handling them. Certainly, we can give everyone a long lead time before removing them.
Actually in this case, I think you would get different content on stderr... and then your program would continue running, whereas in the previous version it would halt. That said, I agree in general that a library should not be performing unexpected IO activities, but my proposal was a compromise to avoid blissful ignorance. Since we've deprecated the old API that doesn't alert the user, I think the current approach is fine. |
Yes, good point 😁 In any case, I think we've settled on a good approach now that will minimize surprise for downstream packages on upgrades (without maximizing breakage) and help us reduce the library I/O! |
Cleaning up SAW's printing is going to be enough fun without having its components generating more behind its back, so returning a list of warnings is fine by me. (Having machinery for the callers to install print handlers might be better in general, but that's extremely painful in Haskell.) The one thing I don't entirely like is that on failure it should ideally return both the errors and any warnings. In general that tends to help avoid unpleasant surprises. However, for this particular warning and this particular situation I don't think it matters much. |
Yes, this is a reasonable suggestion. Initially, I was going to say that I don't see why we couldn't refactor things so that the diff --git a/src/Data/LLVM/BitCode/Parse.hs b/src/Data/LLVM/BitCode/Parse.hs
index 232b0fa..bbd6689 100644
--- a/src/Data/LLVM/BitCode/Parse.hs
+++ b/src/Data/LLVM/BitCode/Parse.hs
@@ -61,7 +62,7 @@ formatContext :: [String] -> [String]
formatContext cxt = "from:" : map ('\t' :) cxt
newtype Parse a = Parse
- { unParse :: ReaderT Env (StateT ParseState (Except Error)) a
+ { unParse :: ReaderT Env (ExceptT Error (State ParseState)) a
} deriving ( Functor, Applicative, MonadFix
, MonadReader Env
, MonadState ParseState
@@ -100,11 +101,9 @@ instance MonadPlus Parse where
{-# INLINE mplus #-}
mplus = (<|>)
-runParse :: Parse a -> Either Error (a, ParseState)
+runParse :: Parse a -> (Either Error a, ParseState)
runParse (Parse m) =
- case runExcept (runStateT (runReaderT m emptyEnv) emptyParseState) of
- Left err -> Left err
- Right res -> Right res
+ runState (runExceptT (runReaderT m emptyEnv)) emptyParseState
notImplemented :: Parse a
notImplemented = fail "not implemented" That way, we can access the Some further investigation reveals that this change wouldn't be enough on its own, however. Unfortunately, there are other exception-related parts of the code that this would impact. For example, consider this part of the llvm-pretty-bc-parser/src/Data/LLVM/BitCode.hs Lines 36 to 37 in 965d6c9
This calls Returning an empty sequence feels off, however, as it defeats the purpose of returning warnings in the event of the failure in the first place. Really, this suggests that this use of All of this is to say: I wouldn't be opposed to performing these changes, especially if you think we'd end up with a more sensible API in the long run. I'd need some extra time to get there, on the other hand. |
I would say don't chase after it for now. This is the sort of thing the general printing cleanup in saw-script is supposed to cope with, and after that sweep's done I can easily carry it into here... and at that point I'll have dealt with all these sorts of problems recently and have recommended solutions to hand. |
I'm happy to merge this as-is (less work for me), but this does mean that if we want to print warnings in the event of failure, then we will have to change the API once again at a later date. In any case, I'll go ahead and defer this to an issue (#286), as implementing this idea proves non-trivial. |
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282. It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions.
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282 (as well as the knock-on changes on the `crucible` side in GaloisInc/crucible#1313). It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions.
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282. It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions.
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282. It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions.
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282 (as well as the knock-on changes on the `crucible` side in GaloisInc/crucible#1313). It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions. For now, we simply print parse warnings using the established mechanisms in `saw-script` and `saw-remote-api`, although there is a broader question of whether we should be using something more systematized to print these warnings (#2129).
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282. It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions.
This bumps the `llvm-pretty-bc-parser` submodule to bring in the changes from GaloisInc/llvm-pretty-bc-parser#282 (as well as the knock-on changes on the `crucible` side in GaloisInc/crucible#1313). It also updates the code to use the `*WithWarnings` variants of bitcode parsing functions. For now, we simply print parse warnings using the established mechanisms in `saw-script` and `saw-remote-api`, although there is a broader question of whether we should be using something more systematized to print these warnings (#2129).
Previously,
llvm-pretty-bc-parser
would produce a fatal error if it encountered a metadata record with an unexpected size. This proves to be extremely cumbersome in practice, however, as LLVM frequently adds new fields to metadata records, and this causes a fair bit of headaches when attempting to support newer LLVM versions.This patch downgrades this class of errors into warnings. It does so by:
Introducing a new
ParseWarning
data type that captures all types of parser-related warnings. (For now, the only type ofParseWarning
isInvalidMetadataRecordSize
, but we may add more in the future.)Adding new functions to
Data.LLVM.BitCode
(all of which end with*WithWarnings
) that returnParseWarning
s alongside the parsedModule
. As such, this patch does not change any of the existingData.LLVM.BitCode
API. It is up to users to decide if they want to opt into the new API that offersParseWarning
s.See the changes in the
disasm-test
andllvm-disasm
test suite for examples of how to use the new API.Fixes #248.