Begin testing models from the ONNX Model Zoo. #23

ScottTodd · 2024-09-06T23:28:53Z

Progress on #6.

A sample test report HTML file is available here: https://scotttodd.github.io/iree-test-suites/onnx_models/report_2024_09_17.html

These new tests

Download models from https://github.com/onnx/models
Extract metadata from the models to determine which functions to call with random data
Run the models through ONNX Runtime as a reference implementation
Import the models using iree-import-onnx (until we have a better API: [onnx] Build real onnx frontend cli/API iree#18289)
Compile the models using iree-compile (currently just for llvm-cpu but this could be parameterized later)
Run the models using iree-run-module, checking outputs using --expected_output and the reference data

Tests are written in Python using a set of pytest helper functions. As the tests run, they can log details about what commands they are running. When run locally, the artifacts/ directory will contain all the relevant files. More can be done in follow-up PRs to improve the ergonomics there (like generating flagfiles).

Each test case can use XFAIL like @pytest.mark.xfail(raises=IreeRunException). As we test across multiple backends or want to configure the test suite from another repo (e.g. iree-org/iree), we can explore more expressive marks.

Note that unlike the ONNX operator tests, these tests use onnxruntime and iree-import-onnx at test time. The operator tests handle that as an infrequently ran offline step. We could do something similar here, but the test inputs and outputs can be rather large for real models and that gets into Git LFS or cloud storage territory.

If this test authoring model works well enough, we can do something similar for other ML frameworks like TFLite (#5).

ScottTodd

Some self review comments from weekend thinking. Busy for a bit then will make those changes

onnx_models/conftest.py

onnx_models/vision_models_test.py

ScottTodd · 2024-09-09T17:23:14Z

onnx_models/vision_models_test.py

+#     )
+
+
+# https://github.com/onnx/models/tree/main/validated/vision/classification/mobilenet


Can write a script to get these directories / files from the onnx/models repo and translate those into test cases

onnx_models/vision_models_test.py

onnx_models/conftest.py

ScottTodd · 2024-09-16T15:46:40Z

Ping @zjgarvey ? Any high level feedback? I should have some time this week to address my own comments, but I'd like to prefetch any other feedback/discussion.

zjgarvey · 2024-09-16T16:39:25Z

Ping @zjgarvey ? Any high level feedback? I should have some time this week to address my own comments, but I'd like to prefetch any other feedback/discussion.

I'll take a look today!

zjgarvey

My biggest concern right now is the handling of dynamic dims and getting the input/output signatures to work in those cases.

Some suggestions pertaining to this:

Have a test-specific dictionary of dim_param assignments to handle models with dynamic dims.
Since you are already generating an onnxruntime session, it might be easier to pass the session itself, together with a dim_param dictionary, to generate the input signature.
Again, since you are using the onnxruntime session, the most reliable method of getting a correct output signature is definitely from the reference outputs of that session run. It might be prohibitively complicated to try and infer the output shapes from a dynamic onnx model. Even with shape inference and dim_params being included, I'm not sure it would be able to determine correct output shapes without actually fixing the dim_params in the onnx model itself (however, there is tooling for that https://github.com/microsoft/onnxruntime/blob/291a5352b27ded5714e5748b381f2efb88f28fb9/tools/python/util/onnx_model_utils.py#L177).

If I don't revisit this soon, please feel free to ping again.

onnx_models/conftest.py

zjgarvey · 2024-09-16T21:40:12Z

onnx_models/conftest.py

+    #   A) List all metadata explicitly
+    #   B) Get metadata on demand from the .onnx protobuf using 'onnx'
+    #   C) Get metadata on demand from the InferenceSession using 'onnxruntime'
+    # This is option B.


One of the issues with option B is that, if you try to get the output signature for a graph that hasn't yet performed shape inference, you will likely get dim_params rather than actual dims for the output shape.

I'm still trying to figure out a good way to do this, since currently I've been getting the input/output signature through option C (which comes with it's own baggage).

Ah, It looks like you use 1 as the shape for dims without a dim_value. I'll try and keep an eye on where that might cause problems.

Another option I'm considering is having test case import be an offline, tool-assisted step, instead of an online step that always runs as part of the tests. That way, we can have developers importing model tests decide explicitly what functions and signatures they want to test. It would be nice if there was only one obvious function and signature for each model that can be safely inferred though :)

Made some functional updates, but I'll continue to iterate on the code style.

This is now using option C - getting the input and output signatures from onnxruntime.InferenceSession. The output signatures are now exactly what comes from onnxruntime, while the input signatures are still turning dynamic dims into 1.

zjgarvey · 2024-09-16T21:46:12Z

onnx_models/conftest.py

+        for input in onnx_model_metadata["inputs"]:
+            input_type = input["iree_type"]
+            input_data_path = input["input_data_path"]
+            run_module_args.append(f"--input={input_type}=@{input_data_path}")


Ah, yeah. I'm not sure how this will work if the dynamic input dims aren't actually 1 (referring to the previous comment about setting dims without a dim_value equal to 1).

Same with the outputs below, only there is a much higher chance of output dynamic dims being something beyond our control.

We have control over the inputs. As long as there aren't restrictions (like must be a multiple of 4, or two dynamic inputs must be equal), setting to 1 seems safe enough to start. We could also override per test case or add a flag that lets you override for the entire test suite.

I at least made it so the output shape comes from the inference session, since that value is out of out control.

Yeah, I know many llms have dim params like past_seq_len and then elsewhere past_seq_len + 1, so there are definitely restrictions for some models.

After looking more carefully at the setup (which is really nice, btw), I think it would be pretty feasible to just add an optional dim_param_dict as an input for compare_between_iree_and_onnxruntime_fn.

In short, I think that could be added later as necessary.

Yep. I just started with vision classification models but as more are added this test code will grow more features. Are there particular ONNX models that you know need that treatment? I can look at importing those next.

onnx_models/utils.py

ScottTodd · 2024-09-17T22:54:35Z

Okay, I made a bunch of updates today and I'm happy enough with where this is now to merge it and start running it nightly.

Major changes:

Moved more util code from conftest.py to utils.py
Reworked metadata extraction and reference output generation to fully use onnxruntime, not onnx (Protobuf)
Imported more vision classification tests (still only f32)
Organized tests into a subfolder
Improved documentation and gave report generation a try (https://scotttodd.github.io/iree-test-suites/onnx_models/report_2024_09_17.html)

PTAL?

onnx_models/conftest.py

zjgarvey · 2024-09-18T14:50:24Z

@ScottTodd , I don't have any pressing comments. The printout you shared was really easy to navigate and the code here feels like it will be painless to iterate on with updates. Please let me know when you feel like this is in a state to merge, and I'll take a final look.

ScottTodd · 2024-09-18T19:16:34Z

PTAL. My plan is to keep this running nightly here in this repo, then also consider running a small number tests in the IREE repo to watch for regressions. That sort of cross repo testing will need either a way to set XFAIL from CLI flags (like the ONNX operator tests do with config files) or we could only test already passing models and just grow the set that is tested over time. Right now the small number of tests only takes 2-3 minutes to run, even including time spent downloading files.

ScottTodd added 13 commits September 6, 2024 11:44

Skeleton ONNX model tests.

55e66cd

Compile with iree-compile.

8e82c2d

Generate random test input data and run the compiled program.

e0726d4

Run through onnxruntime. No results comparison yet.

64c4c6f

Try resnet50.

afc0abf

Compare outputs between ONNX Runtime and IREE.

3e98481

Extract common code into a helper, test resnet and mobilenet.

f6c1f28

Progress on extracting metadata from onnx protos.

1f7d289

Extract metadata and generate test inputs automatically.

43c77f5

Cleanup, initial xfail testing support.

498d516

Add test workflow.

ad2340a

Update docs.

e601d93

Trim dep.

18073c7

ScottTodd marked this pull request as ready for review September 6, 2024 23:29

ScottTodd requested a review from zjgarvey September 6, 2024 23:30

ScottTodd added 2 commits September 6, 2024 16:43

Adjust flags for more complete log output.

628b1eb

Use logger consistently.

c688c8d

ScottTodd commented Sep 9, 2024

View reviewed changes

ScottTodd mentioned this pull request Sep 12, 2024

Rework expected results config files in ONNX ops test suite #25

Open

zjgarvey reviewed Sep 16, 2024

View reviewed changes

ScottTodd added 4 commits September 17, 2024 10:22

Move more util code from conftest.py to utils.py.

70dd073

Extract model names from source URLs.

812b302

Rework metadata extraction.

6963989

Add alexnet test (just trying more models).

0a77c0c

ScottTodd commented Sep 17, 2024

View reviewed changes

onnx_models/utils.py Outdated Show resolved Hide resolved

ScottTodd added 3 commits September 17, 2024 13:01

Enable xfail_strict for the entire directory.

1723696

Cleanup: add dataclasses, prune unused code, update comments.

256f5fa

Save on indentation by using a helper function.

e39121e

ScottTodd added 3 commits September 17, 2024 14:43

Move into subdir, add size marks, import more tests, add to docs.

cb1d833

Iterate on docs and logging.

2700a76

Add pytest-html package to deps.

05e1c07

ScottTodd requested a review from zjgarvey September 17, 2024 22:54

ScottTodd mentioned this pull request Sep 17, 2024

[Linalg] Bring back onnx AveragePool padding asymmetric support llvm/torch-mlir#3455

Draft

zjgarvey reviewed Sep 18, 2024

View reviewed changes

onnx_models/conftest.py Outdated Show resolved Hide resolved

Refactor to support more dtypes.

5e96711

ScottTodd requested a review from zjgarvey September 18, 2024 19:13

zjgarvey approved these changes Sep 19, 2024

View reviewed changes

ScottTodd merged commit 7b8bdf7 into iree-org:main Sep 19, 2024
2 checks passed

ScottTodd deleted the onnx-models branch September 19, 2024 20:25

		# )


		# https://github.com/onnx/models/tree/main/validated/vision/classification/mobilenet

Begin testing models from the ONNX Model Zoo. #23

Begin testing models from the ONNX Model Zoo. #23

Uh oh!

Conversation

ScottTodd commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScottTodd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ScottTodd commented Sep 16, 2024

Uh oh!

zjgarvey commented Sep 16, 2024

Uh oh!

zjgarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ScottTodd commented Sep 17, 2024

Uh oh!

Uh oh!

zjgarvey commented Sep 18, 2024

Uh oh!

ScottTodd commented Sep 18, 2024

Uh oh!

Uh oh!

Uh oh!

ScottTodd commented Sep 6, 2024 •

edited

Loading