·
1 commit
to release/rocm-rel-7.0
since this release
What's Changed
- Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
- [TOSA] Set
accType
to Float16 for the Fp8 types by @umangyadav in #1745 - Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
- Fixes for group conv emit-key by @dhernandez0 in #1748
- Fix performance for non-standard layouts by @dhernandez0 in #1741
- [6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
- [TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
- [6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
- Add Fp8 to quick-tuning by @djramic in #1753
- Add bf16 to tuning runner by @djramic in #1739
- Enable output swizzle for multiple outputs by @dhernandez0 in #1750
- Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
- [DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
- Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
- Add test from SWDEV-518130 by @dhernandez0 in #1757
- [6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
- Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
- Rocmlir tuning driver datatype fix by @dorde-antic in #1761
- [CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
- Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
- Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
- Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
- Add fp8 convolution to the tuning runner by @djramic in #1738
- Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
- Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
- Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
- Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
- Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
- [DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
- Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
- Add checks for uid and devices by @causten in #1777
- Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
- Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
- Skip unsupported datatypes in perfRunner by @djramic in #1780
- Fix initialization for split-k by @dhernandez0 in #1784
- Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
- Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
- Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
- Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
- Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
- Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
- [DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
- Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
- Python script for testing metrics and plotting correlations by @dorde-antic in #1769
- Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
- Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
- Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
- Workaround issue 1802 by @dhernandez0 in #1800
- Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
- Add dependencies for rocprofv3 by @djramic in #1801
- Remove perfTest from Jenkins by @dhernandez0 in #1803
- Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
- GEMM+GEMM migraphx integration by @dhernandez0 in #1791
- Fix for issue 1802 workaround by @dhernandez0 in #1806
- Update MI300 quick-tuning list by @mirza-halilcevic in #1765
- gemm+gemm: extend allowed types by @dhernandez0 in #1795
- Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
- Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
- Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
- Add GEMM scheduleV2 by @umangyadav in #1772
- Modify Tier1 models tuning problems by @dorde-antic in #1810
- Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
- Remove unused files by @dhernandez0 in #1804
- Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
- Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
- Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
- Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
- Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
- Add rocprofv3 to perfRunner by @djramic in #1779
- Fix rocm version in migraphx CI docker image by @djramic in #1837
- Upstream merge sprint 50 by @djramic in #1815
- [CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
- Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
- add back render group but do not assign GID by @umangyadav in #1843
- Causal attention by @dhernandez0 in #1829
- Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
- conv+gemm support by @dhernandez0 in #1820
- Problem config for tier 1 models by @aarushjain29 in #1836
- conv+gemm migraphx integration by @dhernandez0 in #1823
- Separate new Tier1 tuning problems by @dorde-antic in #1849
- Disable test temporarily to pass CI by @umangyadav in #1850
- Implement GQA in AttentionConfiguration by @dorde-antic in #1847
- Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
- Add missing LDS barriers to attention by @dhernandez0 in #1853
- Causal masking: migraphx integration by @dhernandez0 in #1831
- Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
- [CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
- Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
- Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
- Print rocm version and permissions for
/dev/dri
/dev/kfd
by @umangyadav in #1860 - Added fix for environment variables being overwritten in parallel and matrix runs and more diagnostic info by @leo-amd in #1867
- Remove deprecated use of find_package by @umangyadav in #1861
- Add i8 in DATA_TYPES_ATTENTION by @dorde-antic in #1838
- Fix docker arguments for Public CI by @umangyadav in #1868
- Adjust lit parallelism for MFMA architectures based on GPU type by @stefankoncarevic in #1865
- Remove ALLOW_RETRIES by @dhernandez0 in #1852
- Make number of LIT workers 8 by default by @dhernandez0 in #1871
- Make sure arch can't be empty in AmdArchDb by @dhernandez0 in #1862
- Added cleanWs to the post in internal rocMLIR pipeline by @leo-amd in #1872
- Added 5min timeout to the resetGPUs() func by @leo-amd in #1864
- Install rocMLIR requirements in MITuna venv by @dorde-antic in #1879
- Fix Azure builds by @umangyadav in #1884
- Fix output file path for rocprofv3 by @djramic in #1883
- Add missing space for perfRunner and some other minor issues by @dhernandez0 in #1885
- Attention: return LSE (log-sum-exp) by @dhernandez0 in #1882
- Refactor splitConfigFile to use sed-based slicing by @dorde-antic in #1870
- [DO NOT SQUASH] LLVM Upstream Merge Sprint 53 by @mirza-halilcevic in #1863
- Fix arch in Fusion test by @umangyadav in #1888
- Use migraphx-ci image for migraphx stage by @umangyadav in #1892
- Increase time out on resetGPUs to allow resetting more than 8 GPUs by @umangyadav in #1893
- Upstream merge Sprint 54 with Navi3x fixes by @umangyadav in #1891
- Define datatypes for attention dynamically based on chip type by @dorde-antic in #1894
- Define DATA_TYPES_ATTENTION if class AttentionConfiguration is reused outside by @dorde-antic in #1897
- Parameter Sweep for Attention by @dorde-antic in #1830
- [DO NOT SQUASH] Add double rate MFMA instruction for gfx950 by @umangyadav in #1896
- Attention LSE migraphx integration by @dhernandez0 in #1887
- Weekly CI: Tuning on Navi3x/Navi4x by @dorde-antic in #1890
- Allow user to choose softmax type by @dhernandez0 in #1900
- perfRunner: use regex for exact flag detection in config generation by @dorde-antic in #1903
- perfRunner: skip f32 attention kernels on Navi by @dorde-antic in #1902
- Upstream merge 55 by @umangyadav in #1899
- Add acc_type to MatMulOp by @djramic in #1880
- Refactor perfRunner: repr implementation in PerfConfiguration by @dorde-antic in #1901
- [DO NOT SQUASH] Attention: let migraphx decide softmax type and LSE return type by @dhernandez0 in #1904
- [BACKPORT] Allow softmax type conversion to happen before or after elementwise by @umangyadav in #1920
- [7.0][BACKPORT] Fix multi buffer test on gfx950 by @djramic in #1915
- [7.0][BACKPORT] Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1923
- [7.0][BACKPORT] Update minCU count for MI308 by @causten in #1928
- [BACKPORT] Introduce new quick tune lists based on Tier1 configs and separated b… by @umangyadav in #1938
- [BACKPORT] Cherry pick fix for const folding of immediate args by @umangyadav in #1941
- [Backport] Update tests to excludes unsupported tests on Navi2x (#1943) by @umangyadav in #1945
- [Backport] Add regularization for multiple linalgs in preSoftmaxBody by @umangyadav in #1958
- [Backport] Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1965
New Contributors
- @TedThemistokleous made their first contribution in #1746
Full Changelog: rocm-6.4.3...rocm-7.0.1