-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi,
first, congrats on getting Orochi 2.0 release out!
comments:
- tested your new Orochi 2.0 release on Zen4 Raphael iGPU and get some errors..
1a)first RadixSort test fails on Zen4 due to number_of_blocks becoming 0 due to APU having only 2 compute units..
fix (ParallelPrimitives/RadixSort.cpp):
258c258,259
< number_of_blocks = ( number_of_blocks / base ) * base;
---
> number_of_blocks = ( number_of_blocks / base ) * base;
> if( number_of_blocks == 0 ) number_of_blocks = 4;
1b)VulkanComputeSimple checking of pciBusID is incorrect for APUs as Vulkan reports 0 and ROCM 0x6b or viceversa.. I fixed it main.cpp with:
< if (physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
< physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
< physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID) {
< physicalDeviceIndex = i;
< break;
---
> if( physicalDevicePCIBusInfoProperties.pciDomain == props.pciDomainID &&
> // physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID &&
> physicalDevicePCIBusInfoProperties.pciDevice == props.pciDeviceID )
> {
> if( physicalDevicePCIBusInfoProperties.pciBus > 16 ) // integrated!!
> {
> if( props.pciBusID == 0 )
> {
> physicalDeviceIndex = i;
> break;
> }
> }
> else
> {
> if( physicalDevicePCIBusInfoProperties.pciBus == props.pciBusID )
> {
> physicalDeviceIndex = i;
> break;
> }
> }
- can you fix running samples OK by default after a premake5 build?..
I mean running executables directly from Orochi-2.00/dist/bin/Debug or Orochi-2.00/dist/bin/Release folders..
as they seem to try to expect to find kernels or sample data like textures in "../" folder so basically you have to options:
2a) copy or move Debug or Release folder in Orochi-2.0/ folder (out of dist/bin) so samples find required data or
2b) recreate/copy needed files from Orochi-2.0 folder to Orochi-2.00/dist/bin like for example copying:
Orochi-2.00/Test/Texture/texture_test_kernel.hpp
to:
Orochi-2.00/dist/bin/Test/Texture/texture_test_kernel.hpp
3)add Linux support to VulkanComputeSimple RadixSort (Orochi-2.00/Test/VulkanComputeSimple and Orochi-2.00/Test/RadixSort)..
in fact support it's there.. only minor fixes as I added and is very simple..
for Vulkan premake5.lua:
-- buildoptions { "/wd4244" }
buildoptions { "--std=c++14" }
-- links{ "Pop" }
-- links{ "kernel32", "user32", "gdi32", "winspool", "comdlg32", "advapi32", "shell32", "ole32", "oleaut32", "uuid", "odbc32", "odbccp32", "version" }
- finally would be nice if WMMA sample get added NV support.. seems AMD only ATM..
I started with adding to the kernel "#ifdef CUDA_ARCH" code:
#ifdef __CUDA_ARCH__
#include <mma.h>
//https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
using namespace nvcuda;
...
wmma::fragment<wmma::matrix_a, 16, 16, 16, half, wmma::col_major> a_frag2;
wmma::fragment<wmma::matrix_b, 16, 16, 16, half, wmma::col_major> b_frag2;
wmma::fragment<wmma::accumulator, 16, 16, 16, float> acc_frag2;
wmma::fragment<wmma::accumulator, 16, 16, 16, float> c_frag2;
..
#ifdef __CUDA_ARCH__
nvcuda::wmma::mma_sync( c_frag2, a_frag2, b_frag2, c_frag2 );
#else
c_frag = __builtin_amdgcn_wmma_f16_16x16x16_f16_w32( a_frag, b_frag, c_frag, false );
#endif
#endif
Activity