-
Notifications
You must be signed in to change notification settings - Fork 12.3k
OpenCL: add tiled mul_mat_f16_f32 #14535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@rmatif thank you for the PR. I will play with it and the direct convolution PR in the next few days. For matmul, using image1d_buffer is probably the easiest way to utilize L1 cache - it wraps around a normal cl buffer and uses read_image for access, so the index should stay the same as cl buffer. The Q4_0 matmul is already doing this. It is also possible to use normal cl buffer for one matrix input and image_1d_buffer to use both load paths. |
@rmatif, Sorry to bother you. Congratulations on your another excellent PR on ggml-opencl.
What do you think of this plan? Looking forward to your reply/advice and thanks. |
@lhez You're right, using
@zhouwg Please reach out to me via email, and I'll send you the build scripts and discuss further, as this seems off-topic here. |
@rmatif, Thanks so much for your help. I'm so exciting that it's my first time to running the ggml-opencl backend on my Snapdragon 8Elite based phone. llama-bench with qwen1_5-1_8b-chat-q4_0.gguf on master:
llama-cli with qwen1_5-1_8b-chat-q4_0.gguf on master:
llama-bench with Llama-3.2-1B-Instruct-f16.gguf on this PR:
llama-bench with Llama-3.2-1B-Instruct-f16.gguf on master:
BTW, I provide a simple build/shell script to build ggml-opencl backend on Linux for simplify workflow: https://github.com/zhouwg/ggml-hexagon/blob/self-build/scripts/build-run-ggmlopencl-android.sh Can I add this script to this excellent PR or submit a standalone PR so other developers can help to verify ggml-opencl related PR or learning something about OpenCL programming on Android phone accordingly? I think such this script is easy/no technical difficulty but might-be very useful/helpful for other developers. |
This PR introduces a new
mul_mat_f16_f32
kernel that leverages tiling and vectorization. I believe this will serve as a strong baseline for future improvements.In a future PR, I may explore using
image2d_t
to utilize the L1 cache formul_mat
andconv2d
operations. This is a bit tricky as it requires some data preprocessing on the host sideResults on Adreno 830:
Master:
This PR:
@lhez @max-krasnyansky