Add punet benchmarking to the regression suite #19088

saienduri · 2024-11-08T22:06:11Z

This commit adds support to benchmark punet fp16/fp8 performance TOM. This concludes adding all the necessary testing for the SDXL model. It also switches the compilation of punet to use the spec file as it is necessary for tuning at the current state of the project. I've also updated the artifacts in azure, and this time using the date as part of the azure link so everyone knows the time the artifacts were generated. Nithin is working on implementing the spec file optimizations as part of the compiler itself, so we can remove the usage of such files in the future. All tests have timeouts now too and updated existing ones because the CLI flag seems to be per test timeouts (not the whole pytest command itself). Now, we can avoid hangs such as https://github.com/iree-org/iree/actions/runs/11748746984/job/32734141414

ScottTodd · 2024-11-11T23:10:39Z

experimental/benchmarks/sdxl/benchmark_sdxl_rocm.py

@@ -196,13 +242,19 @@ def job_summary_process(ret_value, output):
 def test_sdxl_rocm_benchmark(


O_O is this single def test_ function now 350 lines of code, with a few big branches for if rocm_chip ==?

Can this be split into multiple test functions? This looks particularly difficult to run and edit as it gets more complex. We should also start thinking about generalizing to more than SDXL, at which point we'll definitely need the modularity.

For the job_summary.md step, each test case could write some output via pytest for a test report or file aggregation step to summarize after the test run. See

https://docs.pytest.org/en/stable/how-to/output.html#record-property

https://docs.pytest.org/en/stable/reference/reference.html#record-property

https://docs.pytest.org/en/stable/reference/reference.html#pytest.Item.user_properties

In the IREE ONNX tests I used those properties here:

Write properties in each test case: https://github.com/iree-org/iree-test-suites/blob/e5429e42d0b6b3835e8564751cb7d0c29bcdd9b6/onnx_ops/conftest.py#L253-L257

Run pytest with --report-log: https://github.com/iree-org/iree-test-suites/blob/e5429e42d0b6b3835e8564751cb7d0c29bcdd9b6/.github/workflows/test_onnx_ops.yml#L67

Read properties from test log file: https://github.com/iree-org/iree-test-suites/blob/e5429e42d0b6b3835e8564751cb7d0c29bcdd9b6/onnx_ops/update_config_xfails.py#L63-L104

Here's how I think this could be structured:

def test_sdxl_rocm_unet(): ... def test_sdxl_rocm_clip(): ... def test_sdxl_rocm_vae(): ... def test_sdxl_rocm_e2e(): ... @pytest.mark.gpu_gfx942 def test_sdxl_rocm_punet_int8_fp16(): ... @pytest.mark.gpu_gfx942 def test_sdxl_rocm_punet_int8_fp8(): ...

Where each test case runs just that benchmark and then writes some results to the log. Then, a script parses the log (or maybe in pytest_report_to_serializable: https://docs.pytest.org/en/stable/reference/reference.html#pytest.hookspec.pytest_report_to_serializable) and generates the markdown table.

Ah I was thinking the same as I was coding it up, but was just thinking that this probably won't grow any further so didn't think too much of it. But makes sense, I can split it up into multiple tests. Will definetely be nicer for the eye and debugging 🙃. Also, benchmarking will only be this involved for halo models with this setup. I only see us having benchmarks/sdxl, benchmarks/llama, benchmarks/flux down the line. I don't think this is the best way to build out a general benchmarking test suite. That should live elsewhere. We should probably stick with a custom markdown at least for now due to the specific stats that we care about, but we can definitely leverage pytest reporting features for the general benchmark suite.

https://en.wikipedia.org/wiki/Boiling_frog 😉

If we have a significant number of developers depending on a piece of infrastructure, it is worth the time to support that infrastructure at a level we are confident in (e.g. move the code out of experimental/, encourage more developers to run it locally as part of their regular development, etc.).

What you have here could be merged as-is IMO, but I'd like to see a ~3 month time scale plan for "halo model" testing and benchmarking. I want us to be able to scale up from roughly 3 models on 2 devices and 2 backends to 20 models on 10 devices and 5 backends, then 100 models on 20 devices, etc.. The way to add a new model to the test suite can't be "fork this 600 line experimental/benchmark_sdxl_on_mi250_with_rocm.py file and ask the codegen team for a 300 line attention_and_matmul_spec_{model_name}.mlir"

Haha agreed. Sure, let me write something up. Things become a lot easier to scale when we stick to basic configurations and keep things as simple as possible, which should be the case with both our general model testing/benchmarking suite.

Ok raised this issue to layout a plan of what's needed going forward: #19115. Feel free to edit and add on directly. I think we decided that we will land this, but make this issue plan a priority going forward and move out of experimental

Thanks! Lots of good design details to go through on that issue.

Signed-off-by: saienduri <[email protected]> Signed-off-by: saienduri <[email protected]>

Signed-off-by: saienduri <[email protected]>

saienduri requested review from ScottTodd, benvanik and stellaraccident as code owners November 8, 2024 22:06

saienduri marked this pull request as draft November 8, 2024 22:06

saienduri removed request for benvanik, stellaraccident and ScottTodd November 8, 2024 22:06

saienduri force-pushed the punet-benchmark branch from e756785 to 40fa10b Compare November 9, 2024 00:33

saienduri added the infrastructure Relating to build systems, CI, or testing label Nov 9, 2024

saienduri changed the title ~~Benchmark punet performance~~ Add punet benchmarking to the regression suite Nov 9, 2024

saienduri force-pushed the punet-benchmark branch 3 times, most recently from 08c6852 to ea2be7f Compare November 9, 2024 03:02

saienduri marked this pull request as ready for review November 11, 2024 20:45

saienduri requested a review from ScottTodd November 11, 2024 20:45

ScottTodd reviewed Nov 11, 2024

View reviewed changes

saienduri mentioned this pull request Nov 12, 2024

Overall Model Testing/Benchmarking Plan #19115

Open

14 tasks

saienduri requested a review from ScottTodd November 12, 2024 18:01

ScottTodd approved these changes Nov 12, 2024

View reviewed changes

saienduri force-pushed the punet-benchmark branch from 9eb6af1 to f9b3c67 Compare November 12, 2024 19:36

saienduri enabled auto-merge (squash) November 12, 2024 19:37

saienduri disabled auto-merge November 12, 2024 19:43

saienduri force-pushed the punet-benchmark branch from 5f6a0cf to ea823ee Compare November 13, 2024 09:09

IanWood1 mentioned this pull request Nov 13, 2024

Reapply "Propagate reshapes through generics with reduction… (#18968) #19113

Open

saienduri force-pushed the punet-benchmark branch from ea823ee to 9aaf517 Compare November 13, 2024 22:07

saienduri and others added 5 commits November 13, 2024 14:09

initial testing of punet

bfd9d56

Signed-off-by: saienduri <[email protected]> Signed-off-by: saienduri <[email protected]>

change outputs and mlirs

e6a597d

Signed-off-by: saienduri <[email protected]> Signed-off-by: saienduri <[email protected]>

lint

a4d128f

Signed-off-by: saienduri <[email protected]> Signed-off-by: saienduri <[email protected]>

add xfail for mi250 compile and change vmfb names

2fb0a17

Signed-off-by: saienduri <[email protected]>

add back hz fusion

6911ec3

Signed-off-by: saienduri <[email protected]>

saienduri and others added 8 commits November 13, 2024 14:09

add fp8 benchmarks

b0da27e

Signed-off-by: saienduri <[email protected]>

lint

c3a38e7

Signed-off-by: saienduri <[email protected]>

add timeouts and update golden values

acfcd6e

Signed-off-by: saienduri <[email protected]>

update per test timeout

ec222cf

Signed-off-by: saienduri <[email protected]>

update fp8 dispatch count

f255095

Signed-off-by: saienduri <[email protected]>

back to 1564

8a61b66

Signed-off-by: saienduri <[email protected]>

try the latest spec

81e5bc2

Signed-off-by: saienduri <[email protected]>

update outs for fp16 and fp8

9aaf517

Signed-off-by: saienduri <[email protected]>

saienduri merged commit 43b22de into iree-org:main Nov 13, 2024
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add punet benchmarking to the regression suite #19088

Add punet benchmarking to the regression suite #19088

saienduri commented Nov 8, 2024 •

edited

Loading

ScottTodd Nov 11, 2024

saienduri Nov 11, 2024 •

edited

Loading

ScottTodd Nov 11, 2024

saienduri Nov 11, 2024

saienduri Nov 12, 2024

ScottTodd Nov 12, 2024

		@@ -196,13 +242,19 @@ def job_summary_process(ret_value, output):
		def test_sdxl_rocm_benchmark(

Add punet benchmarking to the regression suite #19088

Add punet benchmarking to the regression suite #19088

Conversation

saienduri commented Nov 8, 2024 • edited Loading

ScottTodd Nov 11, 2024

Choose a reason for hiding this comment

saienduri Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

ScottTodd Nov 11, 2024

Choose a reason for hiding this comment

saienduri Nov 11, 2024

Choose a reason for hiding this comment

saienduri Nov 12, 2024

Choose a reason for hiding this comment

ScottTodd Nov 12, 2024

Choose a reason for hiding this comment

saienduri commented Nov 8, 2024 •

edited

Loading

saienduri Nov 11, 2024 •

edited

Loading