-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin testing models from the ONNX Model Zoo. #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some self review comments from weekend thinking. Busy for a bit then will make those changes
onnx_models/vision_models_test.py
Outdated
# ) | ||
|
||
|
||
# https://github.com/onnx/models/tree/main/validated/vision/classification/mobilenet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can write a script to get these directories / files from the onnx/models repo and translate those into test cases
Ping @zjgarvey ? Any high level feedback? I should have some time this week to address my own comments, but I'd like to prefetch any other feedback/discussion. |
I'll take a look today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern right now is the handling of dynamic dims and getting the input/output signatures to work in those cases.
Some suggestions pertaining to this:
- Have a test-specific dictionary of
dim_param
assignments to handle models with dynamic dims. - Since you are already generating an onnxruntime session, it might be easier to pass the session itself, together with a dim_param dictionary, to generate the input signature.
- Again, since you are using the onnxruntime session, the most reliable method of getting a correct output signature is definitely from the reference outputs of that session run. It might be prohibitively complicated to try and infer the output shapes from a dynamic onnx model. Even with shape inference and dim_params being included, I'm not sure it would be able to determine correct output shapes without actually fixing the dim_params in the onnx model itself (however, there is tooling for that https://github.com/microsoft/onnxruntime/blob/291a5352b27ded5714e5748b381f2efb88f28fb9/tools/python/util/onnx_model_utils.py#L177).
If I don't revisit this soon, please feel free to ping again.
onnx_models/conftest.py
Outdated
# A) List all metadata explicitly | ||
# B) Get metadata on demand from the .onnx protobuf using 'onnx' | ||
# C) Get metadata on demand from the InferenceSession using 'onnxruntime' | ||
# This is option B. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the issues with option B is that, if you try to get the output signature for a graph that hasn't yet performed shape inference, you will likely get dim_params rather than actual dims for the output shape.
I'm still trying to figure out a good way to do this, since currently I've been getting the input/output signature through option C (which comes with it's own baggage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, It looks like you use 1 as the shape for dims without a dim_value
. I'll try and keep an eye on where that might cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option I'm considering is having test case import be an offline, tool-assisted step, instead of an online step that always runs as part of the tests. That way, we can have developers importing model tests decide explicitly what functions and signatures they want to test. It would be nice if there was only one obvious function and signature for each model that can be safely inferred though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some functional updates, but I'll continue to iterate on the code style.
This is now using option C - getting the input and output signatures from onnxruntime.InferenceSession
. The output signatures are now exactly what comes from onnxruntime, while the input signatures are still turning dynamic dims into 1.
onnx_models/conftest.py
Outdated
for input in onnx_model_metadata["inputs"]: | ||
input_type = input["iree_type"] | ||
input_data_path = input["input_data_path"] | ||
run_module_args.append(f"--input={input_type}=@{input_data_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah. I'm not sure how this will work if the dynamic input dims aren't actually 1 (referring to the previous comment about setting dims without a dim_value equal to 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the outputs below, only there is a much higher chance of output dynamic dims being something beyond our control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have control over the inputs. As long as there aren't restrictions (like must be a multiple of 4, or two dynamic inputs must be equal), setting to 1 seems safe enough to start. We could also override per test case or add a flag that lets you override for the entire test suite.
I at least made it so the output shape comes from the inference session, since that value is out of out control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know many llms have dim params like past_seq_len
and then elsewhere past_seq_len + 1
, so there are definitely restrictions for some models.
After looking more carefully at the setup (which is really nice, btw), I think it would be pretty feasible to just add an optional dim_param_dict
as an input for compare_between_iree_and_onnxruntime_fn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, I think that could be added later as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I just started with vision classification models but as more are added this test code will grow more features. Are there particular ONNX models that you know need that treatment? I can look at importing those next.
Okay, I made a bunch of updates today and I'm happy enough with where this is now to merge it and start running it nightly. Major changes:
PTAL? |
@ScottTodd , I don't have any pressing comments. The printout you shared was really easy to navigate and the code here feels like it will be painless to iterate on with updates. Please let me know when you feel like this is in a state to merge, and I'll take a final look. |
PTAL. My plan is to keep this running nightly here in this repo, then also consider running a small number tests in the IREE repo to watch for regressions. That sort of cross repo testing will need either a way to set XFAIL from CLI flags (like the ONNX operator tests do with config files) or we could only test already passing models and just grow the set that is tested over time. Right now the small number of tests only takes 2-3 minutes to run, even including time spent downloading files. |
Progress on #6.
A sample test report HTML file is available here: https://scotttodd.github.io/iree-test-suites/onnx_models/report_2024_09_17.html
These new tests
iree-import-onnx
(until we have a better API: [onnx] Build real onnx frontend cli/API iree#18289)iree-compile
(currently just forllvm-cpu
but this could be parameterized later)iree-run-module
, checking outputs using--expected_output
and the reference dataTests are written in Python using a set of pytest helper functions. As the tests run, they can log details about what commands they are running. When run locally, the
artifacts/
directory will contain all the relevant files. More can be done in follow-up PRs to improve the ergonomics there (like generating flagfiles).Each test case can use XFAIL like
@pytest.mark.xfail(raises=IreeRunException)
. As we test across multiple backends or want to configure the test suite from another repo (e.g. iree-org/iree), we can explore more expressive marks.Note that unlike the ONNX operator tests, these tests use
onnxruntime
andiree-import-onnx
at test time. The operator tests handle that as an infrequently ran offline step. We could do something similar here, but the test inputs and outputs can be rather large for real models and that gets into Git LFS or cloud storage territory.If this test authoring model works well enough, we can do something similar for other ML frameworks like TFLite (#5).