Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DLIGHT][GEMV] Enable gemv schedule for adreno #319

Closed
wants to merge 7 commits into from

Conversation

krishnaraj36
Copy link
Contributor

Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

Models Baseline Latest improved

Llama-2-7B 10 tok/sec 12.5 tok/sec
Qwen-7b 8.5 tok/sec 11 tok/sec

tqchen and others added 7 commits April 18, 2024 09:43
MLC local ci setup. Also CI for Windows and macOS building,
which may take 90-100 mins.

Co-authored-by: Siyuan Feng <[email protected]>
- Revert "[CMake][MSVC] Disable permissive mode for MSVC builds (#16343)"
- Skip MSC tests
- Disable NNPack and TFLite
- Tweak CMAKE_CUDA_ARCHITECTURES
Enabled new gemv schedule for opencl target, which effectively improves
decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

  Models         Baseline       Latest improved

Llama-2-7B       10 tok/sec       12.5 tok/sec
Qwen-7b          8.5 tok/sec      11 tok/sec
@krishnaraj36
Copy link
Contributor Author

@srkreddy1238 : Can you please take a look in this PR.

@tqchen
Copy link
Contributor

tqchen commented Apr 25, 2024

Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm

@krishnaraj36
Copy link
Contributor Author

Thanks @krishnaraj36 , can you send the pr to https://github.com/apache/tvm

closing with this PR apache/tvm#16932

@krishnaraj36 krishnaraj36 deleted the mlc_adreno_decode_sch branch April 26, 2024 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants