Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Simplify AIEClusterBaseAddress pass #237

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

andcarminati
Copy link
Collaborator

  • Including a more generic chaining algorithm.

@andcarminati
Copy link
Collaborator Author

andcarminati commented Nov 15, 2024

QoR results:

Core Compute Cycle Count:

|--------------------------------------------------------------|------------|---------|---------------|
| Core_Compute_Cycle_Count                                     | aie-public | This PR | Total diff    |
|--------------------------------------------------------------|------------|---------|---------------|
| Floor_aie2_0                                                 |        315 |     371 | REGR(+17.78%) |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_FC_0                                                  |       2650 |    2929 | REGR(+10.53%) |
|--------------------------------------------------------------|------------|---------|---------------|
| Pad3D_AIE2_bfloat16                                          |       9208 |    9348 | REGR(+1.52%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_bf16_0                                                |      23275 |   23472 | REGR(+0.85%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2     |        960 |     967 | REGR(+0.73%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_int8_aie2                 |        966 |     973 | REGR(+0.72%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2                 |        978 |     985 | REGR(+0.72%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_aie2_ptr_interface   |        978 |     985 | REGR(+0.72%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOpsAttributeBroadcasting_aie2_int8                    |       1185 |    1192 | REGR(+0.59%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOpsAttributeBroadcasting_aie2_bf16                    |       1499 |    1507 | REGR(+0.53%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2 |       1455 |    1462 | REGR(+0.48%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_bfloat16_aie2             |       1461 |    1468 | REGR(+0.48%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_aie2             |       1474 |    1481 | REGR(+0.47%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_bf16_1                                                |      38198 |   38367 | REGR(+0.44%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Clip_aie2_int8                                               |        246 |     247 | REGR(+0.41%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| CompareOps_K_EQ_GE_GT_LE_LT_CMP_GT_int32_aie2                |       1098 |    1101 | REGR(+0.27%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_5_aie2_bf16                                   |       8707 |    8730 | REGR(+0.26%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_6_aie2_bf16                                   |       8688 |    8709 | REGR(+0.24%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_4_aie2_bf16                                   |      35414 |   35498 | REGR(+0.24%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_1_aie2_bf16                                   |      35383 |   35466 | REGR(+0.23%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_2_aie2_bf16                                   |      17783 |   17821 | REGR(+0.21%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoidTemplated_bf16_0                                  |        557 |     558 | REGR(+0.18%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_1_aie2_bf16                                    |      11868 |   11884 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_2_aie2_bf16                                    |      11884 |   11900 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AddAttributeBroadcasting_aie2_bf16                           |        762 |     763 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SubAttributeBroadcasting_aie2_bf16_0                         |        762 |     763 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_1_aie2_bf16                                   |      13024 |   13041 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_4_aie2_bf16                                   |      13030 |   13047 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_2_aie2_bf16                                   |      13060 |   13077 | REGR(+0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_5_aie2_bf16                                   |       7204 |    7213 | REGR(+0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_6_aie2_bf16                                   |       7211 |    7220 | REGR(+0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_3_aie2_bf16                                   |       7225 |    7234 | REGR(+0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_4_aie2_bf16                                    |      11906 |   11920 | REGR(+0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_DW_1                                                  |        853 |     854 | REGR(+0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_6_aie2_bf16                                    |       7030 |    7038 | REGR(+0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_3_aie2_bf16                                    |       7044 |    7052 | REGR(+0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_5_aie2_bf16                                    |       7047 |    7055 | REGR(+0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MulAttributeBroadcasting_aie2_bf16_0                         |        893 |     894 | REGR(+0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNorm_1                                                  |      16195 |   16213 | REGR(+0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_3_aie2_bf16                                   |       8713 |    8722 | REGR(+0.10%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNorm_0                                                  |      19133 |   19151 | SAME(+0.09%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_DW_bf16_0                                             |       1177 |    1178 | SAME(+0.08%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MaxPool2D_1                                                  |       1260 |    1261 | SAME(+0.08%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNormC8Part2_aie2_bf16_0                                 |      11254 |   11262 | SAME(+0.07%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MaxPool2D_0                                                  |       1468 |    1469 | SAME(+0.07%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BitShift_AIE2_int8                                           |       2008 |    2009 | SAME(+0.05%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InstanceNormPart2_aie2_bf16_0                                |       9528 |    9532 | SAME(+0.04%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_1                                                     |       2452 |    2453 | SAME(+0.04%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_DW_0                                                  |       2941 |    2942 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_6_aie2_int8                                   |       2954 |    2955 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_3_aie2_int8                                   |       2958 |    2959 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_5_aie2_int8                                   |       2975 |    2976 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pad3D_AIE2_int8                                              |       9595 |    9598 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_DW_bf16_1                                             |       3894 |    3895 | SAME(+0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DivAttributeBroadcasting_aie2_bf16_0                         |       5372 |    5373 | SAME(+0.02%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_7_aie2_bf16                                   |       6263 |    6264 | SAME(+0.02%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_1_aie2_int8                                   |       7064 |    7065 | SAME(+0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_4_aie2_int8                                   |       7091 |    7092 | SAME(+0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_2_aie2_int8                                   |       7124 |    7125 | SAME(+0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InterpolateLinear1D_AIE2_bfloat16                            |      14464 |   14466 | SAME(+0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Abs_bf16_0                                                   |        376 |     376 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Abs_int8_0                                                   |        510 |     510 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_0                                                      |        217 |     217 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_1                                                      |        435 |     435 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AddAttributeBroadcasting_aie2_int8                           |        807 |     807 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AddBf16_aie2_0                                               |        673 |     673 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AddBroadcastingBf16_aie2_0                                   |        728 |     728 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AddBroadcasting_aie2_0                                       |        776 |     776 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add_aie2_0                                                   |        726 |     726 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_0                                                  |       1068 |    1068 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_1                                                  |        780 |     780 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_aie2_bfloat16_0                                    |       3247 |    3247 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_aie2_bfloat16_1                                    |       2247 |    2247 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_aie2_int8_0                                        |       1068 |    1068 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| AvgPool2D_aie2_int8_1                                        |        780 |     780 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BilinearInterpolation_0                                      |        667 |     667 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BilinearInterpolation_1                                      |        361 |     361 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BitwiseAnd_int8_0                                            |        467 |     467 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BitwiseNot_aie2_0                                            |        135 |     135 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BitwiseOr_int8_0                                             |        467 |     467 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BitwiseXor_aie2_int8                                         |        710 |     710 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Cast_aie2_bfloat16                                           |        974 |     974 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Cast_aie2_bfloat16_1                                         |        974 |     974 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Cast_aie2_int8                                               |        725 |     725 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Cast_aie2_int8_1                                             |        725 |     725 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Ceil_AIE2_bfloat16                                           |       1412 |    1412 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Ceil_AIE2_int8                                               |        446 |     446 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ChannelsFirstFlatten_bf16_0                                  |      13604 |   13604 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ChannelsFirstFlatten_int8_0                                  |      11932 |   11932 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Clip_aie2_bf16                                               |        227 |     227 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv1D_DW_AIE2_bf16_0                                        |       3358 |    3358 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv1D_DW_AIE2_bf16_1                                        |       3902 |    3902 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv1D_DW_AIE2_int8_0                                        |       1539 |    1539 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv1D_DW_AIE2_int8_1                                        |       1773 |    1773 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_2x8_0                                                 |       1817 |    1817 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_2x8_1                                                 |       3822 |    3822 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_int8_0                                           |      10139 |   10139 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_int8_1                                           |        927 |     927 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_edge_mode_0                                           |      30301 |   30301 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_edge_mode_1                                           |      18719 |   18719 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG4_aie2_bf16_0                                        |        603 |     603 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG4_aie2_bf16_1                                        |        990 |     990 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG4_aie2_int8_0                                        |        364 |     364 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG4_aie2_int8_1                                        |        558 |     558 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG8_aie2_bf16_0                                        |        747 |     747 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG8_aie2_bf16_1                                        |       1149 |    1149 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG8_aie2_int8_0                                        |        436 |     436 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DegroupG8_aie2_int8_1                                        |        637 |     637 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DivAttributeBroadcasting_aie2_int8_0                         |       7802 |    7802 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DivBroadcasting_aie2_0                                       |       2059 |    2059 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DivBroadcasting_aie2_1                                       |       1450 |    1450 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| EleMax_aie2_bfloat16                                         |        227 |     227 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| EleMax_aie2_int8                                             |        164 |     164 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| EleMin_aie2_bfloat16                                         |        227 |     227 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| EleMin_aie2_int8                                             |        164 |     164 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ElemDiv_aie2_0                                               |       2001 |    2001 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ElemDiv_aie2_1                                               |       1388 |    1388 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Erf_aie2_bf16_0                                              |       2770 |    2770 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Erf_aie2_int8_0                                              |       2554 |    2554 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Erf_aie2_int8_0_ptr_interface                                |       2533 |    2533 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Floor_aie2_1                                                 |        881 |     881 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GELU_0                                                       |       2144 |    2144 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GELU_1                                                       |       2811 |    2811 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GeluTemplated_aie2_bf16                                      |       1388 |    1388 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GeluTemplated_aie2_int8                                      |       1214 |    1214 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG4_aie2_bf16_0                                          |        495 |     495 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG4_aie2_int8_0                                          |        312 |     312 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG8_aie2_bf16_0                                          |       1026 |    1026 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG8_aie2_int8_0                                          |        555 |     555 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoidTemplated_int8_0                                  |        284 |     284 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoid_bf16_0                                           |        937 |     937 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoid_bf16_1                                           |        649 |     649 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoid_int8_0                                           |        417 |     417 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardSigmoid_int8_1                                           |        427 |     427 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardswishAsHardsigmoid_aie2_0                                |       1368 |    1368 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| HardswishAsHardsigmoid_aie2_1                                |       1527 |    1527 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Hardswish_aie2_0                                             |       1368 |    1368 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Hardswish_aie2_1                                             |       1522 |    1522 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InstanceNormPart1_aie2_bf16_0                                |       2916 |    2916 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InstanceNormPart1_aie2_int8_0                                |      11387 |   11387 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InterpolateLinear1D_AIE2_int8                                |      11967 |   11967 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNormC8Part1_aie2_bf16_0                                 |       8962 |    8962 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNormC8Part1_aie2_int8_0                                 |       7830 |    7830 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LayerNormC8Part2_aie2_int8_0                                 |      11222 |   11222 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Log_bf16_0                                                   |       4149 |    4149 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Log_int8_0                                                   |       1329 |    1329 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LogicalNot_aie2_0                                            |        225 |     225 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| LogicalXor_aie2_int8                                         |        528 |     528 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MaxPool2D_bf16_0                                             |       1797 |    1797 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MaxPool2D_bf16_1                                             |       1269 |    1269 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mod_aie2_bf16                                                |       5246 |    5246 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mul2d_bf16_0                                                 |        519 |     519 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mul2d_bf16_1                                                 |        327 |     327 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MulAttributeBroadcasting_aie2_int8_0                         |        517 |     517 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MulBf16_aie2_0                                               |        697 |     697 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MulBroadcastingBf16_aie2_0                                   |        752 |     752 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| MulBroadcasting_aie2_0                                       |        294 |     294 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mul_aie2_0                                                   |        231 |     231 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Neg_aie2_0                                                   |        779 |     779 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Neg_aie2_1                                                   |        455 |     455 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pad2D_0                                                      |        568 |     568 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pad2D_1                                                      |       1684 |    1684 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pad2D_bf16_0                                                 |       2394 |    2394 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PixelShuffle_aie2_bf16                                       |       8566 |    8566 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PixelShuffle_aie2_int8                                       |       7280 |    7280 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PixelUnshuffle_bf16_0                                        |      17143 |   17143 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PixelUnshuffle_int8_0                                        |      14571 |   14571 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Range_int8_aie2_0                                            |       1224 |    1224 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Range_int8_aie2_1                                            |       1846 |    1846 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Reciprocal_aie2_0                                            |       1231 |    1231 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Reciprocal_aie2_1                                            |       2155 |    2155 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMeanAxis_7_aie2_int8                                   |       2255 |    2255 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin1D_aie2_bf16                                        |        188 |     188 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin1D_aie2_int8                                        |        164 |     164 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_1_aie2_int8                                    |       6903 |    6903 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_2_aie2_int8                                    |       6943 |    6943 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_3_aie2_int8                                    |       2921 |    2921 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_4_aie2_int8                                    |       6966 |    6966 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_5_aie2_int8                                    |       2924 |    2924 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_6_aie2_int8                                    |       2897 |    2897 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_7_aie2_bf16                                    |       6212 |    6212 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Requantize_0                                                 |       1421 |    1421 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Requantize_1                                                 |        781 |     781 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Rescale_aie2_int8_0                                          |        233 |     233 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Round_aie2_0                                                 |        367 |     367 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Round_aie2_1                                                 |       1092 |    1092 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Rsqrt_aie2_bf16_0                                            |       3602 |    3602 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Rsqrt_aie2_int8_0                                            |       2376 |    2376 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Select_aie2_bf16                                             |        299 |     299 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Select_aie2_int8                                             |        206 |     206 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Shrink_aie2_0                                                |        658 |     658 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Shrink_aie2_1                                                |        759 |     759 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SiLU_aie2_bf16                                               |       2908 |    2908 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SiLU_aie2_int8                                               |       2969 |    2969 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SiLU_aie2_int8_1                                             |       2967 |    2967 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SigmoidTemplated_bf16_0                                      |       1633 |    1633 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SigmoidTemplated_int8_0                                      |       1276 |    1276 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SigmoidTemplated_int8_1                                      |       1276 |    1276 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sigmoid_bf16_0                                               |       2627 |    2627 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sigmoid_bf16_1                                               |       1727 |    1727 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sigmoid_int8_0                                               |         91 |      91 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sigmoid_int8_1                                               |        110 |     110 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sign_bf16_0                                                  |       1078 |    1078 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sign_bf16_1                                                  |        210 |     210 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sign_int8_0                                                  |        416 |     416 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sign_int8_1                                                  |        122 |     122 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sin_aie2_bf16                                                |       3014 |    3014 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sin_aie2_int8                                                |        842 |     842 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sqrt_bf16_0                                                  |      29777 |   29777 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sqrt_bf16_1                                                  |       3793 |    3793 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sqrt_int8_0                                                  |      19162 |   19162 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sqrt_int8_1                                                  |      19162 |   19162 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Squeeze_bfloat16_0                                           |        207 |     207 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Squeeze_int8_0                                               |        207 |     207 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SubAttributeBroadcasting_aie2_int8_0                         |        807 |     807 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SubBroadcasting_aie2_bf16_0                                  |        706 |     706 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SubBroadcasting_aie2_int8_0                                  |        754 |     754 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| SubBroadcasting_aie2_int8_0_ptr_interface                    |        754 |     754 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sub_aie2_bf16_0                                              |        651 |     651 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sub_aie2_int8_0                                              |        704 |     704 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Sub_aie2_int8_0_ptr_interface                                |        704 |     704 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| TanhTemplated_aie2_bfloat16                                  |       1049 |    1049 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| TanhTemplated_aie2_int8                                      |        300 |     300 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Tanh_0                                                       |       1970 |    1970 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Tanh_1                                                       |       2578 |    2578 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Tanh_int8_0                                                  |        339 |     339 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Tanh_int8_1                                                  |        407 |     407 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ThresholdedRelu_aie2_bfloat16                                |        514 |     514 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ThresholdedRelu_aie2_int8                                    |        865 |     865 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk1D_bf16_0                                                |       1217 |    1217 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk1D_bf16_1                                                |        169 |     169 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk1D_int8_0                                                |        766 |     766 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk1D_int8_1                                                |        118 |     118 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk2D_bf16_0                                                |      34469 |   34469 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk2D_bf16_1                                                |        303 |     303 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk2D_int8_0                                                |      28803 |   28803 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Topk2D_int8_1                                                |        248 |     248 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_021                                      |       1856 |    1856 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_021_pad                                  |       2338 |    2338 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_102                                      |       1155 |    1155 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_102_pad                                  |       1140 |    1140 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_120                                      |       1856 |    1856 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_120_pad                                  |       1752 |    1752 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_201                                      |       1871 |    1871 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_201_pad                                  |       1767 |    1767 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_210                                      |       1868 |    1868 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_bf16_210_pad                                  |       1868 |    1868 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_021                                      |       2685 |    2685 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_021_pad                                  |       3612 |    3612 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_102                                      |       1149 |    1149 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_102_pad                                  |       1089 |    1089 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_120                                      |       2686 |    2686 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_120_pad                                  |       2686 |    2686 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_201                                      |       2700 |    2700 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_201_pad                                  |       2544 |    2544 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_210                                      |       2694 |    2694 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Transpose_aie2_int8_210_pad                                  |       2538 |    2538 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| bfloat16                                                     |       1217 |    1217 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| int8                                                         |        847 |     847 | SAME(+0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_Transpose_AIE2_0                                      |      53845 |   53844 | SAME(-0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pow_bf16_1                                                   |      34196 |   34195 | SAME(-0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pow_bf16_0                                                   |      34190 |   34189 | SAME(-0.00%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_Transpose_AIE2_1                                      |      14441 |   14440 | SAME(-0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMM_int8_1                                                  |      32931 |   32928 | SAME(-0.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_Transpose_bf16_AIE2_1                                 |       6292 |    6291 | SAME(-0.02%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Range_bfloat16_aie2_0                                        |       4065 |    4064 | SAME(-0.02%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_7x7s2_Layer1_0                                        |       5885 |    5883 | SAME(-0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| InstanceNormPart2_aie2_int8_0                                |      11508 |   11504 | SAME(-0.03%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_mixed_batch_0                                         |      11094 |   11090 | SAME(-0.04%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Range_bfloat16_aie2_1                                        |       2669 |    2668 | SAME(-0.04%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSumAxis_7_aie2_int8                                    |       2215 |    2214 | SAME(-0.05%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_11x11s4_Layer1_0                                      |       4274 |    4272 | SAME(-0.05%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_11x11s4_1                                             |       5418 |    5415 | SAME(-0.06%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceProdAxis_7_aie2_bf16                                   |       1795 |    1794 | SAME(-0.06%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_11x11s4_Layer1_1                                      |       2979 |    2977 | SAME(-0.07%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_Standalone_1                                     |       2533 |    2531 | SAME(-0.08%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_FC_1                                                  |       1144 |    1143 | SAME(-0.09%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSum_bf16_0                                             |      12199 |   12187 | SAME(-0.10%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSum_bf16_1                                             |      12199 |   12187 | SAME(-0.10%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_11x11s4_0                                             |       5785 |    5779 | IMPR(-0.10%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSum_int8_1                                             |      11387 |   11375 | IMPR(-0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMM_int8_0                                                  |       2797 |    2794 | IMPR(-0.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceSum_int8_0                                             |      19670 |   19646 | IMPR(-0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_7x7s2_Layer1_1                                        |       1613 |    1611 | IMPR(-0.12%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMax_bf16_1                                             |       9421 |    9409 | IMPR(-0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin_bf16_1                                             |      18445 |   18421 | IMPR(-0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_1                                                |      27510 |   27473 | IMPR(-0.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_LReLU_1                                               |       5263 |    5255 | IMPR(-0.15%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_0                                                |       1275 |    1273 | IMPR(-0.16%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_ReLU_Standalone_0                                     |       1275 |    1273 | IMPR(-0.16%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMax_bf16_0                                             |       7193 |    7181 | IMPR(-0.17%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin_bf16_0                                             |       7193 |    7181 | IMPR(-0.17%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mul2D_0                                                      |        533 |     532 | IMPR(-0.19%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mul2D_1                                                      |        533 |     532 | IMPR(-0.19%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mish_aie2_int8                                               |       9516 |    9494 | IMPR(-0.23%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_SV60                                                  |        857 |     855 | IMPR(-0.23%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMax_int8_1                                             |      19315 |   19267 | IMPR(-0.25%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin_int8_1                                             |      19069 |   19021 | IMPR(-0.25%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Slice_int8_0                                                 |       1545 |    1541 | IMPR(-0.26%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PowAttributeBroadcasting_aie2_bf16_0                         |      40590 |   40462 | IMPR(-0.32%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMax_int8_0                                             |      14509 |   14461 | IMPR(-0.33%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_mixed_batch_1                                         |      21518 |   21444 | IMPR(-0.34%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_LReLU_0                                               |       2175 |    2167 | IMPR(-0.37%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Slice_bfloat16_0                                             |        945 |     941 | IMPR(-0.42%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Conv2D_0                                                     |       7694 |    7657 | IMPR(-0.48%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| ReduceMin_int8_0                                             |       8797 |    8749 | IMPR(-0.55%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| FullyConnect_aie2_bf16                                       |       1090 |    1083 | IMPR(-0.64%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BatchNorm2D_1                                                |        416 |     413 | IMPR(-0.72%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BatchNorm1d_aie2_bfloat16                                    |        390 |     387 | IMPR(-0.77%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| DilatedConv2D_1                                              |       5390 |    5347 | IMPR(-0.80%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_Standalone_1                                           |        482 |     478 | IMPR(-0.83%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BatchNorm1d_aie2_int8                                        |        408 |     404 | IMPR(-0.98%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Scale_Add_bf16_0                                             |       1709 |    1690 | IMPR(-1.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Scale_Add_bf16_1                                             |       1709 |    1690 | IMPR(-1.11%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_Standalone_0                                           |        322 |     318 | IMPR(-1.24%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| BatchNorm2D_0                                                |        308 |     304 | IMPR(-1.30%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| FullyConnect_aie2_int8                                       |        829 |     817 | IMPR(-1.45%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMV_0                                                       |        469 |     461 | IMPR(-1.71%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG8_aie2_int8_1                                          |        907 |     891 | IMPR(-1.76%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG8_aie2_bf16_1                                          |       1691 |    1659 | IMPR(-1.89%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Elu_aie2_int8_0                                              |        589 |     577 | IMPR(-2.04%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMV_1                                                       |        387 |     379 | IMPR(-2.07%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMM_bf16_0                                                  |       3622 |    3545 | IMPR(-2.13%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| PowAttributeBroadcasting_aie2_int8_0                         |       4309 |    4210 | IMPR(-2.30%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Pow_int8_0                                                   |       4309 |    4210 | IMPR(-2.30%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Expand_aie2_bfloat16                                         |       1944 |    1881 | IMPR(-3.24%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GEMM_bf16_1                                                  |       7669 |    7408 | IMPR(-3.40%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG4_aie2_int8_1                                          |        860 |     828 | IMPR(-3.72%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Elu_aie2_bf16_0                                              |       2709 |    2603 | IMPR(-3.91%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| GroupG4_aie2_bf16_1                                          |       1596 |    1532 | IMPR(-4.01%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Mish_aie2_bfloat16                                           |       5475 |    5224 | IMPR(-4.58%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Softmax_bf16_1                                               |       1583 |    1510 | IMPR(-4.61%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Exp_bf16_1                                                   |       1227 |    1156 | IMPR(-5.79%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Scale_Add_0                                                  |        374 |     351 | IMPR(-6.15%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Scale_Add_1                                                  |        374 |     351 | IMPR(-6.15%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_bf16_1                                                 |        298 |     274 | IMPR(-8.05%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Tile_aie2_bf16_0                                             |       4248 |    3897 | IMPR(-8.26%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Softmax_bf16_0                                               |       6350 |    5784 | IMPR(-8.91%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Exp_bf16_0                                                   |       6047 |    5480 | IMPR(-9.38%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Add2D_bf16_0                                                 |        254 |     230 | IMPR(-9.45%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Softmax_1                                                    |        425 |     384 | IMPR(-9.65%)  |
|--------------------------------------------------------------|------------|---------|---------------|
| Expand_aie2_int8                                             |       1891 |    1570 | IMPR(-16.98%) |
|--------------------------------------------------------------|------------|---------|---------------|
| Tile_aie2_int8_1                                             |       2579 |    1906 | IMPR(-26.10%) |
|--------------------------------------------------------------|------------|---------|---------------|
| Averege diff                                                 |            | -0.39%  | -0.39%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Diff stdev                                                   |            |    2.48 |          2.48 |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #1                                                  |            | -0.79%  | -0.79%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #2                                                  |            | -0.07%  | -0.07%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #3                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #4                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #5                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #6                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #7                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #8                                                  |            | +0.00%  | +0.00%        |
|--------------------------------------------------------------|------------|---------|---------------|
| Quantile #9                                                  |            | +0.12%  | +0.12%        |
|--------------------------------------------------------------|------------|---------|---------------|
  • For Conv2D_FC_0, ACQ was moved (in _main) to de delay slot of the function call to conv2d_wrapper. In this way, the wait cycles are not accounted to _main, but to conv2d_wrapper. Apart of this effect, we improve in cycle count for this benchmark as well.

  • For Floor_aie2_0, we increase final II (pre-swp), but we should disable unrolling and let post-swp do the job.

  • GEMM_bf16_0/1 is performing in 17 cycles (pre-swp for now).

PM Size effect: -0.09% (basically unaffected).

for (auto RegSet : RegPtrUseMap) {

SmallVector<MachineInstr *, 8> &Instrs = RegSet.second;
// Chaining aceptance criteria.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit acceptance

return true;

// If the base reg is used in any of the successive MBBs, then we don't
// want to chain the corresponding ptr adds. Since that would introduce a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit , since (since is a conjunctive of sub-sentences)

MINext->getOperand(2).getReg(), *MRI);

// Evaluate if we should restart the chain from the base
// pointer. This is necessary whenwe deal with unknonw offsets
Copy link
Collaborator Author

@andcarminati andcarminati Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whenwe, unknonw

@andcarminati andcarminati force-pushed the andreu.cluster.baseaddress.change branch from 8a8b895 to ef3b592 Compare November 19, 2024 09:43
@@ -71,6 +71,10 @@ static cl::opt<unsigned> StackAddrSpace(
cl::desc("Specify the addrspace where the stack is allocated "
"(5: Bank A, 6: Bank B, 7: Bank C, 8: Bank D)"));

static cl::opt<bool> EnableAddressChaining("aie-address-chaining", cl::Hidden,
cl::init(true),
cl::desc("Enable ptradd chaining."));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@gbossu
Copy link
Collaborator

gbossu commented Nov 20, 2024

Regarding the Conv2D_FC_0 regression, @mludevid and @katerynamuts are looking into refining our modelling of semaphores. That might get solved. I think we do not keep enough distance between semaphores and the end of regions.

bool processBasicBlock(MachineBasicBlock &MBB, MachineRegisterInfo &MRI,
MachineIRBuilder &MIB,
MachineRegisterInfo *MRI = nullptr;
MachineDominatorTree *DT = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could those be const?

// Get all candidates, i.e. groups of G_PTR_ADDs in the same
// basic block that shares the same input pointer.
void getChainingCandidates(MachineBasicBlock &MBB,
RegToPtrAddMap &RegPtrUseMap) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this to RegToPtrAddMap getChainingCandidates(MachineBasicBlock &MBB)? This makes it clearer what comes out of this function?

@@ -136,242 +117,152 @@ class AIEClusterBaseAddress : public MachineFunctionPass {

StringRef getPassName() const override { return AIE_CLUSTER_BASE_ADDRESS; }

using RegToPtrAddMap = std::map<Register, SmallVector<MachineInstr *, 8>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just RegUseMap?

bool Changed = false;

// Get all G_PTR_ADDs that use the same pointer.
getChainingCandidates(MBB, RegPtrUseMap);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: I would rename this to e.g. collectPtrUses(), because we just collect registers used in ptr_adds, there is not extra processing.

// case, we do not want to chain the addresses, because this would
// introduce a COPY that increases the pressure on PTR registers.
// Create chains, when profitable.
for (auto RegSet : RegPtrUseMap) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto RegAndUses?

}

// Find if a register is used in reachable MBBs.
bool isRegUsedInReachableMBBs(MachineBasicBlock *MBB, Register Reg) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the previous code in that specific case: std::set<MachineBasicBlock *> ReachableMBBs = findReachableMBBs(MBB);. Reachability is different from dominance.

auto Entry = ChainedPtrAdds.find(&MI);
if (Entry == ChainedPtrAdds.end())
// Build a chain (or set of chains) of G_PTR_ADDs. We consider as
// chain a linear sequence of linked G_PTR_ADDs, tied no output and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: tied to?

@andcarminati andcarminati force-pushed the andreu.cluster.baseaddress.change branch from ef3b592 to 8f2e71d Compare November 22, 2024 09:12
@andcarminati
Copy link
Collaborator Author

Hi @gbossu, all your comments were addressed. Thank you very much!

* Including a more generic chaining algorithm.
@andcarminati andcarminati force-pushed the andreu.cluster.baseaddress.change branch from 8f2e71d to d4115b7 Compare November 22, 2024 09:16
Copy link
Collaborator

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice simplification!
I'm not concerned about the Floor_aie2_0 regression, we'll recover that by using the post-pipeliner.

@andcarminati andcarminati merged commit 5796f62 into aie-public Nov 22, 2024
8 checks passed
@andcarminati andcarminati deleted the andreu.cluster.baseaddress.change branch November 22, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants