[llama.cpp] 最新build（6月5日）已支持Apple Silicon GPU！建议苹果用户更新 #505

ymcui · 2023-06-05T00:52:39Z

ymcui
Jun 5, 2023
Maintainer

llama.cpp已添加基于Metal的inference，推荐Apple Silicon（M系列芯片）用户更新，目前该改动已经合并至main branch。

个人实测，7B/13B模型加速达到50%以上。原PR内容详见：ggml-org/llama.cpp#1642

注意：目前该改动只支持q4_0模型，作者表示后续会陆续更新其他bit的量化算法。

如何更新？

如果你已经安装了llama.cpp，请务必先clean

make clean

然后根据https://github.com/ggerganov/llama.cpp#metal-build 进行安装，例如：

LLAMA_METAL=1 make

如何使用？

只需在原有推理命令上加上-ngl 1即可将模型offload到Apple Silicon GPU上。例如：

./main -m your-model -n 512 -t 8 -ngl 1

速度测试

初步在中文Alpaca-Plus-7B、Alpaca-Plus-13B、LLaMA-33B上进行了速度测试（注意，目前只支持q4_0加速）。测试设备：Apple M1 Max，8线程（-t 8）。系统是macOS Ventura 13.4。

	Plus-7B	Plus-13B	33B
原速度（无Meta加速）	41 ms/tok	77 ms/tok	179 ms/tok
现速度（有Meta加速）	28 ms/tok	49 ms/tok	failed
提速	46%	57%	n/a

33B offload到GPU后解码速度很慢，待后续补充测试。

⚠️ 潜在问题

~~GPU多轮解码结果出现异常~~ （已在最新commit修复），不排除是个例，建议实际体验后选择是否启用GPU（-ngl 1）。
以下是Alpaca-Plus-7B的测试结果，通过-seed 42指定了随机种子。

不启用：

> 你好啊
你好！有什么我可以帮助你的吗？
> 你可以做什么呢
作为一名人工智能，我的主要功能是回答用户的问题、提供信息和建议。我的能力还包括语言翻译、文本分析、语音识别等。您可以通过与我交流来获取所需的信息和帮助。
> 还有呢
当然，我还可以提供一些娱乐内容，如笑话、故事、谜语等等。如果您需要这些信息，我会尽力满足您的需求。

启用：

> 你好啊
你好！有什么我可以帮助你的吗？
> 你可以做什么呢
作为一个语言模型，我能够回答您的问题、提供翻译、生成代码、聊天等。如果您有任何疑问或需要帮助，请随时告诉我。
> 还有呢
Human compassion, not judgement (from Latin cognoscente for judgment, no justice of the people) for human rights and human rights (from Latin ius (for right of the right side with the side of the table of contents the table of goods containing the grain of 43D printing the 3D printers of the 3-dimensional in the 3-dimensions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llama.cpp] 最新build（6月5日）已支持Apple Silicon GPU！建议苹果用户更新 #505

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[llama.cpp] 最新build（6月5日）已支持Apple Silicon GPU！建议苹果用户更新 #505

Uh oh!

Uh oh!

ymcui Jun 5, 2023 Maintainer

如何更新？

如何使用？

速度测试

⚠️ 潜在问题

Replies: 0 comments

ymcui
Jun 5, 2023
Maintainer