Skip to content

Update speed tests to measure GPU performance for cuPQC code #2160

Open
@dstebila

Description

@dstebila

Discussed in https://github.com/orgs/open-quantum-safe/discussions/2076

Originally posted by lakshya-chopra February 11, 2025
In the current version of libOQS, running the speed_kem.c tests for ML-KEM is using CPU cycles as a benchmark for GPU based cuPQC (on platforms with GPU & where OQS_USE_CUPQC=ON). To verify this, I added debug statements in the following file to check which function gets called. To my surprise, running the speed test always invoked cuPQC's function, yet the reported benchmark results were still based on CPU cycle counts.

image

Build CMD:

cmake -DBUILD_SHARED_LIBS=ON  -DOQS_USE_OPENSSL=OFF  -DCMAKE_BUILD_TYPE=Release -DOQS_DIST_BUILD=ON  \
-DOQS_USE_CUPQC=ON  -DCMAKE_PREFIX_PATH=/home/master/cupqc/cupqc-pkg-0.2.0/cmake   \    
-DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc  -DCMAKE_CUDA_ARCHITECTURES=86    \    
-DOQS_ENABLE_KEM_ml_kem_768_cuda=ON ..

Speed comparisons

To further confirm this, I compared the speed results of Kyber768 & ML-KEM-768 (which should be similar) and got these results:

$ ./speed_kem Kyber768
Configuration info
==================
Target platform:  x86_64-Linux-5.15.0-131-generic
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       5afca642057faa54878cf6937b46fe6f00b45646
OpenSSL enabled:  No
AES:              NI
SHA-2:            C
SHA-3:            C
OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-02-12 18:37:02
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768                             |            |                |                 |            |                           |
keygen                               |     376913 |          3.000 |           7.959 |      0.736 |                     19219 |       1532
encaps                               |     295155 |          3.000 |          10.164 |      0.486 |                     24552 |        923
decaps                               |     377094 |          3.000 |           7.956 |      0.527 |                     19211 |        891

For ML-KEM-768:

OQS build flags:  BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-02-12 18:36:45
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-768                           |            |                |                 |            |                           |
keygen                               |      18847 |          3.000 |         159.178 |    539.811 |                    385029 |    1305897
encaps                               |      19025 |          3.000 |         157.695 |      5.361 |                    381451 |      12921
decaps                               |      18271 |          3.000 |         164.196 |      5.137 |                    397182 |      12384

Clearly, these results are far off & do not represent an accurate picture.

Feature Request

It would be beneficial if the speed test could accurately measure GPU performance when cuPQC is used.
As an example,
image

If this is an actual issue, I’d be happy to help :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedAsking for support from non-core teamplatform-specificIssue related to a specific platform configuration; core team may not have required platform

    Type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions