Replies: 4 comments 8 replies
-
Download prebuilt binaries with IPEX for your system: Then launch the server like this: As for the model, there are currently better options available here: |
Beta Was this translation helpful? Give feedback.
-
You can use Vulkan as well, which should be very simple to set up. The ubuntu binaries should just work and building it is also not complex. For Intel GPUs, I would recommend legacy quants (Q4_0, Q4_1, Q5_0, Q5_1, Q8_0) currently, as they have the best performance. Others are not yet as optimized. It won't be as fast as SYCL, but it should be usable. |
Beta Was this translation helpful? Give feedback.
-
llama-b4040-bin-win-sycl-x64.zip is binary for windows. But your OS is Arch. So the package can't run on your OS. In your case, please follow the guide SYCL. You need to convert the cmds to Arch Linux. I know someone has built the llama.cpp for SYCL backend on Arch linux. |
Beta Was this translation helpful? Give feedback.
-
As a fellow A750 user, SYCL is faster than vulkan atm. |
Beta Was this translation helpful? Give feedback.
-
Can anyone give me a guide that tells, how to self host DeepSeek-Coder 6.7b with Intel Arc A750? I have no previous experiences with self hosting or docker. I want use it in a
VS Code
extension namedcontinue
for auto completion. I followed this guide for SYCL and was able to get this information, I got really tired.My system information,
CPU: Intel i5 13400F
GPU: Intel Arc A750 8GB VRAM
RAM: 16GB
My system information,
OS: Arch Linux
Kernel: Linux 6.12.21-1-lts
Purpose,
I want to self host an AI that can help me with auto completion in vim or vscode.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions