Skip to content

Conversation

@dkjung
Copy link
Collaborator

@dkjung dkjung commented Nov 18, 2025

This PR enhances CUDA support and adds CUDA unit tests.

Key Changes:

  1. CUDA Unit Test Addition (unittest_cuda.cpp):

    • Comprehensive unit tests for RMSNorm CUDA kernel
    • Test cases for various dimension sizes and epsilon values
    • CUDA kernel execution time measurement functionality
  2. CUDA Build Configuration Improvements:

    • Reorganized CUDA operations subdirectory inclusion from nntrainer/meson.build to nntrainer/tensor/meson.build
    • Added CUDA test target to test/unittest/meson.build
    • Fixed CUDA linking issues by adding proper link arguments (-NOIMPLIB, -NOEXP)
    • Added CUDA dependencies handling in unit test build configuration
  3. Prevent Unnecessary File Generation:

    • Modified linker options to prevent .lib and .exp file generation during unittest_cuda.exe build on Windows

Expected Benefits:

  • Improved test coverage for CUDA functionality
  • Enhanced CUDA build system stability
  • Optimized CUDA builds on Windows environments

These changes contribute to strengthening CUDA support in the build system and improving reliability of CUDA operations.

Adds CUDA context management files (cuda_context.h
and cuda_context.cpp) that provide similar
functionality to the existing OpenCL context.

The changes include:

- CudaContext class inheriting from Context and Singleton
- CUDA kernel management and execution interfaces
- Build system updates to support CUDA with enable-cuda option
- Conditional linking of CUDA runtime library for both Windows and Linux
- Addition of enable-cuda option in meson_options.txt

Signed-off-by: Daekyoung Jung <[email protected]>
This commit adds CUDA context management files (cuda_context.h and cuda_context.cpp)
that provide similar functionality to the existing OpenCL context.

The changes include:

- Implementation of CudaContext class inheriting from Context and Singleton
- CUDA kernel management and execution interface
- Build system updates to support CUDA with enable-cuda meson_options
- Conditional linking of CUDA runtime library for both Windows and Linux
- Addition of enable-cuda option in meson_options.txt
- Implementation of RMSNorm CUDA kernel and build configuration

Signed-off-by: Daekyoung Jung <[email protected]>
This commit includes the following changes:
1. Add new CUDA unit test file (unittest_cuda.cpp) with RMSNorm CUDA kernel
   tests
2. Reorganize CUDA operations directory structure by moving subdir inclusion
   from nntrainer/meson.build to nntrainer/tensor/meson.build
3. Add CUDA test target in test/unittest/meson.build
4. Fix CUDA linking issues by adding proper link arguments (-NOIMPLIB, -NOEXP)
   to prevent generation of unnecessary .lib and .exp files
5. Add CUDA dependencies handling in unit test build configuration

The changes ensure proper CUDA support in the build system and add
comprehensive unit tests for CUDA operations.

Signed-off-by: Daekyoung Jung <[email protected]>
/**
* @brief Get the name of the context
*/
std::string getName() override { return "cuda"; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you register CudaContext, base_properties.h is not ready for it:

/**
* @brief Enumeration of Run Engine type
*/
struct ComputeEngineTypeInfo {
using Enum = ml::train::LayerComputeEngine;
static constexpr std::initializer_list<Enum> EnumList = {Enum::CPU, Enum::GPU,
Enum::QNN};
static constexpr const char *EnumStr[] = {"cpu", "gpu", "qnn"};
};

if get_option('enable-cuda')
nntrainer_headers += meson.current_source_dir() / 'cuda_context.h'
nntrainer_common_sources += 'cuda_context.cpp'
extra_defines += '-DENABLE_CUDA=1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ignored code.

In /meson.build

nntrainer/meson.build

Lines 704 to 707 in ba47d7f

message('extra defines are:' + ' '.join(extra_defines))
foreach defs: extra_defines
add_project_arguments(defs, language: ['c', 'cpp'])
endforeach

happens before subdir('nntrainer').

You need to do this at /meson.build, before L704

auto &cuda_context = nntrainer::CudaContext::Global();

registerContext("cuda", &cuda_context);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add #include "cuda_context.h" in this engine.cpp file.

cuda_deps = []
cuda_link_args = []
if target[0] == 'unittest_cuda' and get_option('enable-cuda')
cuda_deps = [cuda_dep]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cuda_dep`` in scope of the cuda_dep``` of /nntrainer/tensor/meson.build? It doesn't appear so.

foreach kernel : cuda_sources
obj_name = kernel.replace('.cu', '.o')
obj = custom_target(obj_name,
command: [nvcc, '-c', '-Xcompiler', '/MD', '@INPUT@', '-o', '@OUTPUT@'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use /MD unconditionally. It should depend on the compiler.

@@ -0,0 +1,34 @@
# Find CUDA compiler
dep = dependency('cuda', version : '>=13', modules : ['cublas'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if cuda is available, this might fail (at least in most Linux distros). Check if this is effective in developer machines.


void CudaContext::add_default_object() {
// Register default layers that support CUDA
registerFactory(nntrainer::createLayer<FullyConnectedLayer>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for the testing or intentional? It seems like try to add CPU FullyConnectedLayer, not the Cude Layer.
If those are just testing, then please leave the comments.

@jijoongmoon
Copy link
Collaborator

please add [ Wait for #3567 ] in title

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants