Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix trademark issues #515

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove example output and make README more concise
nmishra31 committed Jun 17, 2024
commit 8bbb9d2279c27da891013d5b903572f08c8b8225
563 changes: 3 additions & 560 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -17,582 +17,25 @@ The example executable naming convention follows `example_<$domain>_<$routine>_<
or `example_<$domain>_<$routine>` for run-time dispatching examples.
E.g. `example_blas_gemm_usm_mklcpu_cublas ` `example_blas_gemm_usm`

## Example outputs (blas, rng, lapack, dft, sparse_blas)
## Running examples (blas)

## blas

Below are showcases of how to run examples with different backends using the BLAS domain as an illustration.

Run-time dispatching examples with mklcpu backend
```
$ export ONEAPI_DEVICE_SELECTOR="opencl:cpu"
$ ./bin/example_blas_gemm_usm
########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example:
#
# C = alpha * A * B + beta * C
#
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
#
# Using apis:
# gemm
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running BLAS GEMM USM example on CPU device.
Device name is: Intel(R) Core(TM) i7-6770HQ Processor @ 2.60GHz
Running with single precision real data type:
GEMM parameters:
transA = trans, transB = nontrans
m = 45, n = 98, k = 67
lda = 103, ldB = 105, ldC = 106
alpha = 2, beta = 3
Outputting 2x2 block of A,B,C matrices:
A = [ 0.340188, 0.260249, ...
[ -0.105617, 0.0125354, ...
[ ...
B = [ -0.326421, -0.192968, ...
[ 0.363891, 0.251295, ...
[ ...
C = [ 0.00698781, 0.525862, ...
[ 0.585167, 1.59017, ...
[ ...
BLAS GEMM USM example ran OK.
```
Run-time dispatching examples with mklgpu backend
```
$ export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ ./bin/example_blas_gemm_usm
########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example:
#
# C = alpha * A * B + beta * C
#
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
#
# Using apis:
# gemm
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running BLAS GEMM USM example on GPU device.
Device name is: Intel(R) Iris(R) Pro Graphics 580 [0x193b]
Running with single precision real data type:
GEMM parameters:
transA = trans, transB = nontrans
m = 45, n = 98, k = 67
lda = 103, ldB = 105, ldC = 106
alpha = 2, beta = 3
Outputting 2x2 block of A,B,C matrices:
A = [ 0.340188, 0.260249, ...
[ -0.105617, 0.0125354, ...
[ ...
B = [ -0.326421, -0.192968, ...
[ 0.363891, 0.251295, ...
[ ...
C = [ 0.00698781, 0.525862, ...
[ 0.585167, 1.59017, ...
[ ...
BLAS GEMM USM example ran OK.
```
Compile-time dispatching example with both mklcpu and cublas backend

(Note that the mklcpu and cublas result matrices have a small difference. This is expected due to precision limitation of `float`)
```
./bin/example_blas_gemm_usm_mklcpu_cublas
########################################################################
# General Matrix-Matrix Multiplication using Unified Shared Memory Example:
#
# C = alpha * A * B + beta * C
#
# where A, B and C are general dense matrices and alpha, beta are
# floating point type precision scalars.
#
# Using apis:
# gemm
#
# Using single precision (float) data type
#
# Running on both Intel CPU and Nvidia GPU devices
#
########################################################################
Running BLAS GEMM USM example
Running with single precision real data type on:
CPU device: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
GPU device: TITAN RTX
GEMM parameters:
transA = trans, transB = nontrans
m = 45, n = 98, k = 67
lda = 103, ldB = 105, ldC = 106
alpha = 2, beta = 3
Outputting 2x2 block of A,B,C matrices:
A = [ 0.340188, 0.260249, ...
[ -0.105617, 0.0125354, ...
[ ...
B = [ -0.326421, -0.192968, ...
[ 0.363891, 0.251295, ...
[ ...
(CPU) C = [ 0.00698781, 0.525862, ...
[ 0.585167, 1.59017, ...
[ ...
(GPU) C = [ 0.00698793, 0.525862, ...
[ 0.585168, 1.59017, ...
[ ...
BLAS GEMM USM example ran OK on MKLCPU and CUBLAS
```

## lapack
Run-time dispatching example with mklgpu backend:
```
$ export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ ./bin/example_lapack_getrs_usm
########################################################################
# LU Factorization and Solve Example:
#
# Computes LU Factorization A = P * L * U
# and uses it to solve for X in a system of linear equations:
# AX = B
# where A is a general dense matrix and B is a matrix whose columns
# are the right-hand sides for the systems of equations.
#
# Using apis:
# getrf and getrs
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running LAPACK getrs example on GPU device.
Device name is: Intel(R) Iris(R) Pro Graphics 580 [0x193b]
Running with single precision real data type:
GETRF and GETRS parameters:
trans = nontrans
m = 23, n = 23, nrhs = 23
lda = 32, ldb = 32
Outputting 2x2 block of A and X matrices:
A = [ 0.340188, 0.304177, ...
[ -0.105617, -0.343321, ...
[ ...
X = [ -1.1748, 1.84793, ...
[ 1.47856, 0.189481, ...
[ ...
LAPACK GETRS USM example ran OK
```

Compile-time dispatching example with both mklcpu and cusolver backend
```
$ ./bin/example_lapack_getrs_usm_mklcpu_cusolver
########################################################################
# LU Factorization and Solve Example:
#
# Computes LU Factorization A = P * L * U
# and uses it to solve for X in a system of linear equations:
# AX = B
# where A is a general dense matrix and B is a matrix whose columns
# are the right-hand sides for the systems of equations.
#
# Using apis:
# getrf and getrs
#
# Using single precision (float) data type
#
# Running on both Intel CPU and NVIDIA GPU devices
#
########################################################################
Running LAPACK GETRS USM example
Running with single precision real data type on:
CPU device :Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
GPU device :TITAN RTX
GETRF and GETRS parameters:
trans = nontrans
m = 23, n = 23, nrhs = 23
lda = 32, ldb = 32
Outputting 2x2 block of A,B,X matrices:
A = [ 0.340188, 0.304177, ...
[ -0.105617, -0.343321, ...
[ ...
(CPU) X = [ -1.1748, 1.84793, ...
[ 1.47856, 0.189481, ...
[ ...
(GPU) X = [ -1.1748, 1.84793, ...
[ 1.47856, 0.189481, ...
[ ...
LAPACK GETRS USM example ran OK on MKLCPU and CUSOLVER
```

## rng
Run-time dispatching example with mklgpu backend:
```
$ export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ ./bin/example_rng_uniform_usm
########################################################################
# Generate uniformly distributed random numbers with philox4x32x10
# generator example:
#
# Using APIs:
# default_engine uniform
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running RNG uniform usm example on GPU device
Device name is: Intel(R) Iris(R) Pro Graphics 580 [0x193b]
Running with single precision real data type:
generation parameters:
seed = 777, a = 0, b = 10
Output of generator:
first 10 numbers of 1000:
8.52971 1.76033 6.04753 3.68079 9.04039 2.61014 3.75788 3.94859 7.93444 8.60436
Random number generator with uniform distribution ran OK
```

Compile-time dispatching example with both mklcpu and curand backend
```
$ ./bin/example_rng_uniform_usm_mklcpu_curand
########################################################################
# Generate uniformly distributed random numbers with philox4x32x10
# generator example:
#
# Using APIs:
# default_engine uniform
#
# Using single precision (float) data type
#
# Running on both Intel CPU and Nvidia GPU devices
#
########################################################################
Running RNG uniform usm example
Running with single precision real data type:
CPU device: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
GPU device: TITAN RTX
generation parameters:
seed = 777, a = 0, b = 10
Output of generator on CPU device:
first 10 numbers of 1000:
8.52971 1.76033 6.04753 3.68079 9.04039 2.61014 3.75788 3.94859 7.93444 8.60436
Output of generator on GPU device:
first 10 numbers of 1000:
3.52971 6.76033 1.04753 8.68079 4.48229 0.501966 6.78265 8.99091 6.39516 9.67955
Random number generator example with uniform distribution ran OK on MKLCPU and CURAND
```

## dft

Compile-time dispatching example with MKLGPU backend

```none
$ ONEAPI_DEVICE_SELECTOR="level_zero:gpu" ./bin/example_dft_complex_fwd_buffer_mklgpu
########################################################################
# Complex out-of-place forward transform for Buffer API's example:
#
# Using APIs:
# Compile-time dispatch API
# Buffer forward complex out-of-place
#
# Using single precision (float) data type
#
# For Intel GPU with Intel MKLGPU backend.
#
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
########################################################################
Running DFT Complex forward out-of-place buffer example
Using compile-time dispatch API with MKLGPU.
Running with single precision real data type on:
GPU device :Intel(R) UHD Graphics 750 [0x4c8a]
DFT Complex USM example ran OK on MKLGPU
```

Runtime dispatching example with MKLGPU, cuFFT, rocFFT and portFFT backends:

```none
$ ONEAPI_DEVICE_SELECTOR="level_zero:gpu" ./bin/example_dft_real_fwd_usm
########################################################################
# DFT complex in-place forward transform with USM API example:
#
# Using APIs:
# USM forward complex in-place
# Run-time dispatch
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running DFT complex forward example on GPU device
Device name is: Intel(R) UHD Graphics 750 [0x4c8a]
Running with single precision real data type:
DFT example run_time dispatch
DFT example ran OK
```

```none
$ ONEAPI_DEVICE_SELECTOR="level_zero:gpu" ./bin/example_dft_real_fwd_usm
########################################################################
# DFT complex in-place forward transform with USM API example:
#
# Using APIs:
# USM forward complex in-place
# Run-time dispatch
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running DFT complex forward example on GPU device
Device name is: NVIDIA A100-PCIE-40GB
Running with single precision real data type:
DFT example run_time dispatch
DFT example ran OK
```

```none
$ ./bin/example_dft_real_fwd_usm
########################################################################
# DFT complex in-place forward transform with USM API example:
#
# Using APIs:
# USM forward complex in-place
# Run-time dispatch
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running DFT complex forward example on GPU device
Device name is: AMD Radeon PRO W6800
Running with single precision real data type:
DFT example run_time dispatch
DFT example ran OK
```

```none
$ LD_LIBRARY_PATH=lib/:$LD_LIBRARY_PATH ./bin/example_dft_real_fwd_usm
########################################################################
# DFT complex in-place forward transform with USM API example:
#
# Using APIs:
# USM forward complex in-place
# Run-time dispatch
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running DFT complex forward example on GPU device
Device name is: Intel(R) UHD Graphics 750
Running with single precision real data type:
DFT example run_time dispatch
Unsupported Configuration:
oneMKL: dft/backends/portfft/commit: function is not implemented portFFT only supports complex to complex transforms
```

## sparse_blas

Run-time dispatching examples with mklcpu backend
```
$ export ONEAPI_DEVICE_SELECTOR="opencl:cpu"
$ ./bin/example_sparse_blas_gemv_usm
########################################################################
# Sparse Matrix-Vector Multiply Example:
#
# y = alpha * op(A) * x + beta * y
#
# where A is a sparse matrix in CSR format, x and y are dense vectors
# and alpha, beta are floating point type precision scalars.
#
# Using apis:
# sparse::gemv
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running Sparse BLAS GEMV USM example on CPU device.
Device name is: Intel(R) Core(TM) i7-6700K Processor @ 4.00GHz
Running with single precision real data type:
sparse::gemv parameters:
transA = nontrans
nrows = 64
alpha = 1, beta = 0
sparse::gemv example passed
Finished
Sparse BLAS GEMV USM example ran OK.
```

Run-time dispatching examples with mklgpu backend
```
$ export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ ./bin/example_sparse_blas_gemv_usm
########################################################################
# Sparse Matrix-Vector Multiply Example:
#
# y = alpha * op(A) * x + beta * y
#
# where A is a sparse matrix in CSR format, x and y are dense vectors
# and alpha, beta are floating point type precision scalars.
#
# Using apis:
# sparse::gemv
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################
Running Sparse BLAS GEMV USM example on GPU device.
Device name is: Intel(R) HD Graphics 530 [0x1912]
Running with single precision real data type:
sparse::gemv parameters:
transA = nontrans
nrows = 64
alpha = 1, beta = 0
sparse::gemv example passed
Finished
Sparse BLAS GEMV USM example ran OK.
```

Compile-time dispatching example with mklcpu backend
```
$ export ONEAPI_DEVICE_SELECTOR="opencl:cpu"
$ ./bin/example_sparse_blas_gemv_usm_mklcpu
########################################################################
# Sparse Matrix-Vector Multiply Example:
#
# y = alpha * op(A) * x + beta * y
#
# where A is a sparse matrix in CSR format, x and y are dense vectors
# and alpha, beta are floating point type precision scalars.
#
# Using apis:
# sparse::gemv
#
# Using single precision (float) data type
#
# Running on Intel CPU device
#
########################################################################
Running Sparse BLAS GEMV USM example on CPU device.
Device name is: Intel(R) Core(TM) i7-6700K Processor @ 4.00GHz
Running with single precision real data type:
sparse::gemv parameters:
transA = nontrans
nrows = 64
alpha = 1, beta = 0
sparse::gemv example passed
Finished
Sparse BLAS GEMV USM example ran OK.
```