-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e test suite for the Attention - CPU Backend #17751
Conversation
8f7cfff
to
c5a78ec
Compare
@ScottTodd, I need your advice on the pre-commit and DCO failings below. |
For pre-commit, see https://iree.dev/developers/general/contributing/#coding-style-guidelines. The logs are telling you that a generated file needs to be updated by running For DCO, see https://iree.dev/developers/general/contributing/#developer-certificate-of-origin. The logs for the action also include steps you can take to resolve it. |
3416fe8
to
1e65ec5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from attention side of things! Very excited to see this. I'll let Scott drive rest of the review on infra side of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Looking at the cc implementation now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more information to the pull request description, including a link to the tracking issue (#17892). See also https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue if you want this PR to close the issue.
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
Signed-off-by: ERMAN GURSES <[email protected]>
9c814d9
to
d2a677c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from attention side. Please wait for Scott's approval also.
I think @ScottTodd has already approved the PR. |
This had some failures overnight:
|
This reverts commit 2d629c6.
Reverts #17751. A few of the new tests are failing on various platforms: * Timeouts (after 60 seconds) in `iree/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_large_llvm-cpu_local-task` on GitHub-hosted Windows and macOS runners * https://github.com/iree-org/iree/actions/runs/10468974350/job/28990992473#step:8:2477 * https://github.com/iree-org/iree/actions/runs/10468947894/job/28990909629#step:9:3076 ``` 1529/1568 Test #969: iree/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_large_llvm-cpu_local-task .............................***Timeout 60.07 sec --- TEST[attention_2_2048_256_512_128_dtype_f16_f16_f16_f16_2_2048_256_512_128_256_1.0_0] --- Attention shape (BATCHxMxK1xK2xN): 2x2048x256x512x256x128 ``` * Compilation error on arm64: https://github.com/iree-org/iree/actions/runs/10468944505/job/28990909321#step:4:9815: ``` [415/1150] Generating /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.vmfb from e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir FAILED: tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.vmfb /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.vmfb cd /work/build-arm64/tests/e2e/attention && /work/build-arm64/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir -o /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.vmfb --iree-hal-executable-object-search-path=\"/work/build-arm64\" --iree-llvmcpu-embedded-linker-path=\"/work/build-arm64/llvm-project/bin/lld\" --iree-llvmcpu-wasm-linker-path=\"/work/build-arm64/llvm-project/bin/lld\" /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir:4:14: error: Yield operand #2 is not equivalent to the corresponding iter bbArg %result1 = iree_linalg_ext.attention { ^ /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir:1:1: note: called from func.func @attention_2_1024_128_256_64_dtype_f16_f16_f16_f16(%query: tensor<2x1024x128xf16>, %key: tensor<2x256x128xf16>, %value: tensor<2x256x64xf16>, %scale: f32) -> tensor<2x1024x64xf16> { ^ /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir:4:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu = "generic", cpu_features = "+reserve-x18", data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32", native_vector_size = 16 : i64, target_triple = "aarch64-unknown-unknown-eabi-elf"}> %result1 = iree_linalg_ext.attention { ^ /work/build-arm64/tests/e2e/attention/e2e_attention_cpu_f16_f16_f16_medium_llvm-cpu_local-task_attention.mlir:1:1: note: called from func.func @attention_2_1024_128_256_64_dtype_f16_f16_f16_f16(%query: tensor<2x1024x128xf16>, %key: tensor<2x256x128xf16>, %value: tensor<2x256x64xf16>, %scale: f32) -> tensor<2x1024x64xf16> { ^ failed to translate executables ```
Add the e2e test suite for the Attention. It only checks CPU FP16, and the reference implementation is FP32. #17751 #18302 --------- Signed-off-by: erman-gurses <[email protected]>
Add the e2e test suite for the Attention. For now, it only checks CPU FP16, and the reference implementation is FP32.