GitHub - jeffzhou2000/true-story-of-ggmlhexagon: the truly sad story behind development of a specified llama.cpp backend for Qualcomm's Hexagon NPU on Android phone

Sorry to bother developers & experts in the Github tech community.

I'm an experienced Chinese full-stack Android programmer. There have been long-term misunderstandings & conflicts between me and a Chinese C++ programmer in llama.cpp community, I'm sorry to see that because it shouldn't happened in this tech community which is outside of mainland China. The following is the truly sad story behind development of a specified llamam.cpp backend(aka ggml-hexagon) for Qualcomm's Hexagon NPU on Android phone.

History of the ggml-hexagon

I started developing ggml-hexagon since March 2024, the fully history of the ggml-hexagon can be found at jeffzhou2000/ggml-hexagon#18

PR in the llama.cpp project

[2024-04-24] initial formal PR of ggml-qnn in upstream llama.cpp(04/24/2024---06/15/2024) ggml-org/llama.cpp#6869

[2025-02-13] PR in upstream llama.cpp(submitted by my friend's account because I was blocked in the llama.cpp community since July 19 2024): ggml-org/llama.cpp#11844

[2025-02-24] PR in upstream llama.cpp(I suddenly found that I can access to the llama.cpp community again on 2025-02-16 and I'll never forget who gave that right to me): ggml-org/llama.cpp#12049

[2025-03-11] PR in upstream llama.cpp: ggml-org/llama.cpp#14356

We can clearly see that the first and the second and the third PR were all intentionally polluted/destroyed by a Chinese C++ programmer who is the author of ggml-org/llama.cpp#12063 which was submitted on 2025-02-25.

Conflicts in PR-6869,PR-11844, PR-12049

My Github accout's previous name was zhouwg.

case-1

case-2(the author of PR-12063 pushed me again and again in my initial PR-6869 and then intentionally submitted a meaningless commit in my initial PR of 12326 because the author of PR-12063 is a typical CN C++ programmer, this is one of the reasons why I decided left Github(didn't submitted any codes) after June 2024, BTW, obviously, one of the maintainers didn't agree with this opinion although it's the fact that why I decided to left Github on June 2024 before I was blocked on July 19 2024. In the fact, my initial approach in PR-6869 is correct and verified on 03/2025 after I backed to Github since 01/29/2025, the author of PR-12063 only need to or can submitted the correct codes/ideas in the initial PR of 12326 rather than intentionally submitted a hard-forked&incompatible PR-12063 on 02/25/2025

case-3(Squeeze out the original author)

case-4

case-5(Answering other developers' questions on behalf of the author without the author's consent)

case-6(Anger the original author and let original author make mistakes)

We can clearly see that it seems the author of PR-12036 had a long-term plan and his plan was successfully implemented on 2025-02-25: he opened a hard-forked PR without any breakthrough progress on 2025-02-25 and I personally think this behaviour breaks the rule-based order. Obviously, this is also one of the fundamental reasons for the long-term conflict between me and the author of PR-12036.

Summary

PR-12063 is a hard-forked PR of my initial PR and PR-12063 was opened on 02/25/2025 without any breakthrough progress and I think this behaviour breaks the rule based order, their refactoring work of my initial PR has been going on for about a year since 07/2024 and still a draft PR. At the same time, PR-12326 is ready for review as an initial version because PR-12326 is already a functional and practical PR and I personally think it's more mature than PR-8273 or the initial ggml-vulkan, I'm not sure whether this is fair to other developers in this tech community?
The beautiful and grand picture which described in the PR-12063 on 02/25/2025 has been verified a fake news: the so-called "standout" feature(mapping ggml cgraph to a single QNN graph) in PR-12063 is fake. PR-12063's new approach which claimed on 05/27/2025 is exactly equivalent to PR-12326 which already implemented before 04/24/2025.
After 05/27/2025, domain tech experts all can see that PR-12063 and PR-12326 are exactly equivalent and the implementation is incompatible. in other words, there is a long-term conflict between PR-12063 and PR-12326.
PR-12326's benchmark data can be reproduced by any third-party, PR-12063 didn't provide the detailed steps to help other developers to check/reproduce their beautiful benchmark data which they published in ggml-org/llama.cpp#8273. I'd like to check and reproduce PR-12063's beautiful benchmark data on my self-purchased 8Gen3 and 8Elite phone. I personally think as a pure tech community we should have zero tolerance for fake news and fake data because this place is not mainland China.
I really don't understand: as one of the core maintainers and a modern C++ master and the original author of the ggml backend subsystem, slaren can clearly clarified that he is not the original author of ggml-vulkan, why it's so difficult for the author of that competing PR(ggml-org/llama.cpp#12063)? As I said many times: I'll close the PR-12326 immediately and waiting for the official PR of Hexagon-NPU from Qualcomm if the author of PR-12063 is really comes from Qualcomm China Shenzhen branch.
I was blocked in the llama.cpp community on July 19 2024 and I was blocked again in the llama.cpp community on July 14 2025, both of these account blocking incidents were caused by/related to the Chinese C++ programmer or the author of PR-12036.
I think I really did something wrong: I shouldn't open discussions-14356 in the upstream llama.cpp community and I shouldn't dropped non-technical comments in disussions-14356. I shouldn't dropped any comments in discussions-14662.
Undoubtedly I'm the biggest victim of this pointless conflict between PR-12063 and PR-12326: I submitted an original PR and spent about 5-7 months on this PR and self-purchased two Snapdragon high-end mobile SoC equipped Android phone(8Gen3 & 8Elite), my original PR was hard-forked by a Chinese C++ programmer(I firmly believe that non-Chinese programmers will not hard fork this PR, so there will be no such meaningless conflict), and what’s even more ironic is that I have already implemented the so-called new approach proposed in the competing PR on May 27 2025 in PR-12326 before 04/24/2025, PR-12326 is already a functional and practical PR. in the end I was blocked again in the upstream llama.cpp project. I can't believe how this happened in a pure tech community which is outside of mainland China!
[updated on Jul 20 2025] Obviously, the author of that competitive PR(PR-12063) is not an regular employee of Qualcomm China Shenzhen Branch(I already said about this many many weeks ago, because I have lived in China for many years and I know China and Chinese people very well):

10.Georgi helped me a lot since 04/2024 and I understand his final decision:I would make the same decision If I were in his position. At the same time, I also will never forget slaren's help in the llama.cpp community although we can't agree on something. Best wishes for llama.cpp project and llama.cpp community.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

History of the ggml-hexagon

PR in the llama.cpp project

Conflicts in PR-6869,PR-11844, PR-12049

Summary

About

Uh oh!

Releases

Packages

License

jeffzhou2000/true-story-of-ggmlhexagon

Folders and files

Latest commit

History

Repository files navigation

History of the ggml-hexagon

PR in the llama.cpp project

Conflicts in PR-6869,PR-11844, PR-12049

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages