Skip to content

Conversation

@Anna-Szal
Copy link

@Anna-Szal Anna-Szal commented Nov 27, 2025

Dependency of the PR

Commits to be reviewed in this PR

l2norm test fix

test: blas_kernels --> l2norm
problem: the output was compared against CPU version of the algorithm which turned out to be less precise
solution: using fp64 version of the CPU algorithm as ground truth

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Anna Szal [email protected]

rotary_emb test fix

test: attention_kernels --> rotary_emb
problem: InBufferC is used as third input buffer which

  • has "unused_buffer_bytes" size = 1 float
  • is set as "read-only" but is being altered inside the kernel
    solution: switch to outBufferB which is nor read-only and change its size to "buffer_size_bytes"

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Anna Szal [email protected]

concatGPU test fix

test: layers--> ConcatGPU/LayerGoldenTest
problem: improper passing of buffers as arguments to SetKernelArguments()
solution: retrieve and store buffer pointer in a variable, then pass its address as the argument

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Anna Szal [email protected]

Summary

  • bug fixes for several tests on GPU when built with OpenCL

Signed-off-by: Anna Szal [email protected]

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. Please take a moment to review the comments below. :)


result = kernel_concat_ptr->SetKernelArguments(
0, clbuffInstance.getInBufferA()->GetBuffer(), sizeof(cl_mem));
result = kernel_concat_ptr->SetKernelArguments(0, &bufferInA, sizeof(cl_mem));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also work in the same way?

Suggested change
result = kernel_concat_ptr->SetKernelArguments(0, &bufferInA, sizeof(cl_mem));
result = kernel_concat_ptr->SetKernelArguments(0, &clbuffInstance.getInBufferA()->GetBuffer(), sizeof(cl_mem));

void ClBufferManager::initBuffers() {
inBufferA = new opencl::Buffer(context_inst_, buffer_size_bytes, true);
inBufferB = new opencl::Buffer(context_inst_, buffer_size_bytes, true);
inBufferC = new opencl::Buffer(context_inst_, unused_buffer_bytes, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also revert inBufferC since it is used for the attention_kernel, which is also one of the failing tests

Suggested change
inBufferC = new opencl::Buffer(context_inst_, unused_buffer_bytes, true);
inBufferC = new opencl::Buffer(context_inst_, buffer_size_bytes, true);

Comment on lines +31 to +32
#include <cblas.h>
#include <cblas_interface.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not recommend directly including headers related to CBLAS. The reason is that there is an option called enable-blas, which only includes OpenBLAS when it is set to true. If you include the headers directly and enable-blas is set to false, it will result in an error. I personally suggest modifying the code accordingly.

float snrm2(const unsigned int N, const float *X, const unsigned int incX) {
#ifdef USE_BLAS
return __cblas_snrm2(N, X, incX);
#else
return __fallback_snrm2(N, X, incX);
#endif
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants