Release rocm-7.0.1 · ROCm/rocMLIR

What's Changed

Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
[TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1745
Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
Fixes for group conv emit-key by @dhernandez0 in #1748
Fix performance for non-standard layouts by @dhernandez0 in #1741
[6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
[TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
[6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
Add Fp8 to quick-tuning by @djramic in #1753
Add bf16 to tuning runner by @djramic in #1739
Enable output swizzle for multiple outputs by @dhernandez0 in #1750
Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
[DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
Add test from SWDEV-518130 by @dhernandez0 in #1757
[6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
Rocmlir tuning driver datatype fix by @dorde-antic in #1761
[CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
Add fp8 convolution to the tuning runner by @djramic in #1738
Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
[DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
Add checks for uid and devices by @causten in #1777
Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
Skip unsupported datatypes in perfRunner by @djramic in #1780
Fix initialization for split-k by @dhernandez0 in #1784
Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
[DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
Python script for testing metrics and plotting correlations by @dorde-antic in #1769
Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
Workaround issue 1802 by @dhernandez0 in #1800
Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
Add dependencies for rocprofv3 by @djramic in #1801
Remove perfTest from Jenkins by @dhernandez0 in #1803
Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
GEMM+GEMM migraphx integration by @dhernandez0 in #1791
Fix for issue 1802 workaround by @dhernandez0 in #1806
Update MI300 quick-tuning list by @mirza-halilcevic in #1765
gemm+gemm: extend allowed types by @dhernandez0 in #1795
Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
Add GEMM scheduleV2 by @umangyadav in #1772
Modify Tier1 models tuning problems by @dorde-antic in #1810
Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
Remove unused files by @dhernandez0 in #1804
Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
Add rocprofv3 to perfRunner by @djramic in #1779
Fix rocm version in migraphx CI docker image by @djramic in #1837
Upstream merge sprint 50 by @djramic in #1815
[CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
add back render group but do not assign GID by @umangyadav in #1843
Causal attention by @dhernandez0 in #1829
Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
conv+gemm support by @dhernandez0 in #1820
Problem config for tier 1 models by @aarushjain29 in #1836
conv+gemm migraphx integration by @dhernandez0 in #1823
Separate new Tier1 tuning problems by @dorde-antic in #1849
Disable test temporarily to pass CI by @umangyadav in #1850
Implement GQA in AttentionConfiguration by @dorde-antic in #1847
Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
Add missing LDS barriers to attention by @dhernandez0 in #1853
Causal masking: migraphx integration by @dhernandez0 in #1831
Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
[CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
Print rocm version and permissions for /dev/dri /dev/kfd by @umangyadav in #1860
Added fix for environment variables being overwritten in parallel and matrix runs and more diagnostic info by @leo-amd in #1867
Remove deprecated use of find_package by @umangyadav in #1861
Add i8 in DATA_TYPES_ATTENTION by @dorde-antic in #1838
Fix docker arguments for Public CI by @umangyadav in #1868
Adjust lit parallelism for MFMA architectures based on GPU type by @stefankoncarevic in #1865
Remove ALLOW_RETRIES by @dhernandez0 in #1852
Make number of LIT workers 8 by default by @dhernandez0 in #1871
Make sure arch can't be empty in AmdArchDb by @dhernandez0 in #1862
Added cleanWs to the post in internal rocMLIR pipeline by @leo-amd in #1872
Added 5min timeout to the resetGPUs() func by @leo-amd in #1864
Install rocMLIR requirements in MITuna venv by @dorde-antic in #1879
Fix Azure builds by @umangyadav in #1884
Fix output file path for rocprofv3 by @djramic in #1883
Add missing space for perfRunner and some other minor issues by @dhernandez0 in #1885
Attention: return LSE (log-sum-exp) by @dhernandez0 in #1882
Refactor splitConfigFile to use sed-based slicing by @dorde-antic in #1870
[DO NOT SQUASH] LLVM Upstream Merge Sprint 53 by @mirza-halilcevic in #1863
Fix arch in Fusion test by @umangyadav in #1888
Use migraphx-ci image for migraphx stage by @umangyadav in #1892
Increase time out on resetGPUs to allow resetting more than 8 GPUs by @umangyadav in #1893
Upstream merge Sprint 54 with Navi3x fixes by @umangyadav in #1891
Define datatypes for attention dynamically based on chip type by @dorde-antic in #1894
Define DATA_TYPES_ATTENTION if class AttentionConfiguration is reused outside by @dorde-antic in #1897
Parameter Sweep for Attention by @dorde-antic in #1830
[DO NOT SQUASH] Add double rate MFMA instruction for gfx950 by @umangyadav in #1896
Attention LSE migraphx integration by @dhernandez0 in #1887
Weekly CI: Tuning on Navi3x/Navi4x by @dorde-antic in #1890
Allow user to choose softmax type by @dhernandez0 in #1900
perfRunner: use regex for exact flag detection in config generation by @dorde-antic in #1903
perfRunner: skip f32 attention kernels on Navi by @dorde-antic in #1902
Upstream merge 55 by @umangyadav in #1899
Add acc_type to MatMulOp by @djramic in #1880
Refactor perfRunner: repr implementation in PerfConfiguration by @dorde-antic in #1901
[DO NOT SQUASH] Attention: let migraphx decide softmax type and LSE return type by @dhernandez0 in #1904
[BACKPORT] Allow softmax type conversion to happen before or after elementwise by @umangyadav in #1920
[7.0][BACKPORT] Fix multi buffer test on gfx950 by @djramic in #1915
[7.0][BACKPORT] Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1923
[7.0][BACKPORT] Update minCU count for MI308 by @causten in #1928
[BACKPORT] Introduce new quick tune lists based on Tier1 configs and separated b… by @umangyadav in #1938
[BACKPORT] Cherry pick fix for const folding of immediate args by @umangyadav in #1941
[Backport] Update tests to excludes unsupported tests on Navi2x (#1943) by @umangyadav in #1945
[Backport] Add regularization for multiple linalgs in preSoftmaxBody by @umangyadav in #1958
[Backport] Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1965

New Contributors

@TedThemistokleous made their first contribution in #1746

Full Changelog: rocm-6.4.3...rocm-7.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rocm-7.0.1

What's Changed

New Contributors

Contributors

Uh oh!