You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
This is mostly related to ggml, but I was advised to report the issue here.
Basically, this would require implementing quantization shaders for Vulkan (that's the easy part), and supporting them in the cpp code.
Motivation
With stable-diffusion.cpp compiled with Vulkan backend, when attempting to load a lora on a quantized model (any non float type), the program prints Missing CPY op for types: f32 q8_0 (for example) and crashes at this line.
Having more ops implemented is a good thing, especially if it fixes a crash downstream.
Possible Implementation
I'm guessing something like this for the shaders (q8_0):
#version450
#include "quant_head.comp" //do not exixtlayout(local_size_x =256, local_size_y =1, local_size_z =1) in;
layout (binding =0) readonlybuffer A {float data_a[];};
layout (binding =1) writeonlybuffer D {block_q8_0 data_b[];};
void main() {
constuint i = gl_WorkGroupID.x *4+ gl_LocalInvocationID.x /64;
constuint tid = gl_LocalInvocationID.x %64;
constuint il = tid /32;
constuint ir = tid %32;
constuint ib =32* i + ir;
if (ib >= p.nel /32) {
return;
}
constuint b_idx =1024* i +32* ir +16* il;
float absmax =0.0;
[[unroll]] for (uint j =0; j <32; ++j) {
absmax =max(absmax, abs(data_a[b_idx + j]));
}
float d= absmax /127.0;
float id = d !=0. ?1./d : d;
data_b[ib].d = float16_t(d);
[[unroll]] for (uint j =0; j <32; ++j) {
data_b[ib].qs[16* il + j] = uint8_t(clamp(data_a[b_idx + j] * id, -128.0, 127.0));
}
}
I don't know how to proceed further in the implementation.
The text was updated successfully, but these errors were encountered:
Prerequisites
Feature Description
This is mostly related to ggml, but I was advised to report the issue here.
Basically, this would require implementing quantization shaders for Vulkan (that's the easy part), and supporting them in the cpp code.
Motivation
With stable-diffusion.cpp compiled with Vulkan backend, when attempting to load a lora on a quantized model (any non float type), the program prints
Missing CPY op for types: f32 q8_0
(for example) and crashes at this line.Having more ops implemented is a good thing, especially if it fixes a crash downstream.
Possible Implementation
I'm guessing something like this for the shaders (q8_0):
I don't know how to proceed further in the implementation.
The text was updated successfully, but these errors were encountered: