CUDA: always create events for split buffers #10185

JohannesGaessler · 2024-11-05T16:51:27Z

I think the correct way to fix it is to just create the events unconditionally. Regardless of how the data is split you always need the events on the currently active device for the other devices to wait on. You could maybe reduce the number of events by only initializing those that are actually needed but I don't think that would be worthwhile since for the vast majority of use cases all events are already being created and used anyways.

slaren · 2024-11-05T17:09:12Z

Qwen2.5-0.5B does not work with this change alone, it still crashes in the memcpy later:

CUDA error: invalid argument
  current device: 1, in function ggml_cuda_op_mul_mat at ggml/src/ggml-cuda.cu:1583
  cudaMemcpyPeerAsync( src1_ddq_i, id, src1_ddq_i_source, ctx.device, src1_ncols*src1_padded_col_size*q8_1_ts/q8_1_bs, stream)

slaren · 2024-11-05T17:12:29Z

It would also be possible to prevent using a split buffer entirely if the matrix is too small by returning false in the supports_op check.

JohannesGaessler added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Nov 5, 2024

CUDA: always create events for split buffers

38d11f5

JohannesGaessler force-pushed the cuda-fix-event-initialization branch from bde4116 to 38d11f5 Compare November 5, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: always create events for split buffers #10185

CUDA: always create events for split buffers #10185

JohannesGaessler commented Nov 5, 2024

slaren commented Nov 5, 2024

slaren commented Nov 5, 2024

CUDA: always create events for split buffers #10185

Are you sure you want to change the base?

CUDA: always create events for split buffers #10185

Conversation

JohannesGaessler commented Nov 5, 2024

slaren commented Nov 5, 2024

slaren commented Nov 5, 2024