Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add punet benchmarking to the regression suite #19088

Merged
merged 13 commits into from
Nov 13, 2024

Conversation

saienduri
Copy link
Collaborator

@saienduri saienduri commented Nov 8, 2024

This commit adds support to benchmark punet fp16/fp8 performance TOM. This concludes adding all the necessary testing for the SDXL model. It also switches the compilation of punet to use the spec file as it is necessary for tuning at the current state of the project. I've also updated the artifacts in azure, and this time using the date as part of the azure link so everyone knows the time the artifacts were generated. Nithin is working on implementing the spec file optimizations as part of the compiler itself, so we can remove the usage of such files in the future. All tests have timeouts now too and updated existing ones because the CLI flag seems to be per test timeouts (not the whole pytest command itself). Now, we can avoid hangs such as https://github.com/iree-org/iree/actions/runs/11748746984/job/32734141414

@saienduri saienduri marked this pull request as draft November 8, 2024 22:06
@saienduri saienduri added the infrastructure Relating to build systems, CI, or testing label Nov 9, 2024
@saienduri saienduri changed the title Benchmark punet performance Add punet benchmarking to the regression suite Nov 9, 2024
@saienduri saienduri force-pushed the punet-benchmark branch 3 times, most recently from 08c6852 to ea2be7f Compare November 9, 2024 03:02
@saienduri saienduri marked this pull request as ready for review November 11, 2024 20:45
@@ -196,13 +242,19 @@ def job_summary_process(ret_value, output):
def test_sdxl_rocm_benchmark(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_O is this single def test_ function now 350 lines of code, with a few big branches for if rocm_chip ==?

Can this be split into multiple test functions? This looks particularly difficult to run and edit as it gets more complex. We should also start thinking about generalizing to more than SDXL, at which point we'll definitely need the modularity.

For the job_summary.md step, each test case could write some output via pytest for a test report or file aggregation step to summarize after the test run. See

In the IREE ONNX tests I used those properties here:

Here's how I think this could be structured:

def test_sdxl_rocm_unet():
  ...

def test_sdxl_rocm_clip():
  ...

def test_sdxl_rocm_vae():
  ...

def test_sdxl_rocm_e2e():
  ...

@pytest.mark.gpu_gfx942
def test_sdxl_rocm_punet_int8_fp16():
  ...

@pytest.mark.gpu_gfx942
def test_sdxl_rocm_punet_int8_fp8():
  ...

Where each test case runs just that benchmark and then writes some results to the log. Then, a script parses the log (or maybe in pytest_report_to_serializable: https://docs.pytest.org/en/stable/reference/reference.html#pytest.hookspec.pytest_report_to_serializable) and generates the markdown table.

Copy link
Collaborator Author

@saienduri saienduri Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was thinking the same as I was coding it up, but was just thinking that this probably won't grow any further so didn't think too much of it. But makes sense, I can split it up into multiple tests. Will definetely be nicer for the eye and debugging 🙃. Also, benchmarking will only be this involved for halo models with this setup. I only see us having benchmarks/sdxl, benchmarks/llama, benchmarks/flux down the line. I don't think this is the best way to build out a general benchmarking test suite. That should live elsewhere. We should probably stick with a custom markdown at least for now due to the specific stats that we care about, but we can definitely leverage pytest reporting features for the general benchmark suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Boiling_frog 😉

If we have a significant number of developers depending on a piece of infrastructure, it is worth the time to support that infrastructure at a level we are confident in (e.g. move the code out of experimental/, encourage more developers to run it locally as part of their regular development, etc.).

What you have here could be merged as-is IMO, but I'd like to see a ~3 month time scale plan for "halo model" testing and benchmarking. I want us to be able to scale up from roughly 3 models on 2 devices and 2 backends to 20 models on 10 devices and 5 backends, then 100 models on 20 devices, etc.. The way to add a new model to the test suite can't be "fork this 600 line experimental/benchmark_sdxl_on_mi250_with_rocm.py file and ask the codegen team for a 300 line attention_and_matmul_spec_{model_name}.mlir"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha agreed. Sure, let me write something up. Things become a lot easier to scale when we stick to basic configurations and keep things as simple as possible, which should be the case with both our general model testing/benchmarking suite.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok raised this issue to layout a plan of what's needed going forward: #19115. Feel free to edit and add on directly. I think we decided that we will land this, but make this issue plan a priority going forward and move out of experimental

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Lots of good design details to go through on that issue.

saienduri and others added 5 commits November 13, 2024 14:09
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
saienduri and others added 8 commits November 13, 2024 14:09
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
@saienduri saienduri merged commit 43b22de into iree-org:main Nov 13, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants