1313| InternLM-XComposer | 7B | Yes | Yes | Yes | Yes |
1414| InternLM-XComposer2 | 7B, 4khd-7B | Yes | Yes | Yes | Yes |
1515| QWen | 1.8B - 72B | Yes | Yes | Yes | Yes |
16- | QWen1.5 | 1.8B - 72B | Yes | Yes | Yes | Yes |
16+ | QWen1.5 | 1.8B - 110B | Yes | Yes | Yes | Yes |
17+ | QWen2 | 1.5B - 72B | Yes | Yes | Yes | Yes |
1718| Mistral | 7B | Yes | Yes | Yes | No |
1819| QWen-VL | 7B | Yes | Yes | Yes | Yes |
1920| DeepSeek-VL | 7B | Yes | Yes | Yes | Yes |
@@ -35,29 +36,31 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
3536
3637## Models supported by PyTorch
3738
38- | Model | Size | FP16/BF16 | KV INT8 | W8A8 |
39- | :-----------------: | :--------: | :-------: | :-----: | :--: |
40- | Llama | 7B - 65B | Yes | No | Yes |
41- | Llama2 | 7B - 70B | Yes | No | Yes |
42- | Llama3 | 8B, 70B | Yes | No | Yes |
43- | InternLM | 7B - 20B | Yes | No | Yes |
44- | InternLM2 | 7B - 20B | Yes | No | - |
45- | InternLM2.5 | 7B | Yes | No | - |
46- | Baichuan2 | 7B - 13B | Yes | No | Yes |
47- | ChatGLM2 | 6B | Yes | No | No |
48- | Falcon | 7B - 180B | Yes | No | No |
49- | YI | 6B - 34B | Yes | No | No |
50- | Mistral | 7B | Yes | No | No |
51- | Mixtral | 8x7B | Yes | No | No |
52- | QWen | 1.8B - 72B | Yes | No | No |
53- | QWen1.5 | 0.5B - 72B | Yes | No | No |
54- | QWen1.5-MoE | A2.7B | Yes | No | No |
55- | DeepSeek-MoE | 16B | Yes | No | No |
56- | Gemma | 2B-7B | Yes | No | No |
57- | Dbrx | 132B | Yes | No | No |
58- | StarCoder2 | 3B-15B | Yes | No | No |
59- | Phi-3-mini | 3.8B | Yes | No | No |
60- | CogVLM-Chat | 17B | Yes | No | No |
61- | CogVLM2-Chat | 19B | Yes | No | No |
62- | LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
63- | InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
39+ | Model | Size | FP16/BF16 | KV INT8 | W8A8 |
40+ | :-----------------: | :---------: | :-------: | :-----: | :--: |
41+ | Llama | 7B - 65B | Yes | No | Yes |
42+ | Llama2 | 7B - 70B | Yes | No | Yes |
43+ | Llama3 | 8B, 70B | Yes | No | Yes |
44+ | InternLM | 7B - 20B | Yes | No | Yes |
45+ | InternLM2 | 7B - 20B | Yes | No | - |
46+ | InternLM2.5 | 7B | Yes | No | - |
47+ | Baichuan2 | 7B - 13B | Yes | No | Yes |
48+ | ChatGLM2 | 6B | Yes | No | No |
49+ | Falcon | 7B - 180B | Yes | No | No |
50+ | YI | 6B - 34B | Yes | No | No |
51+ | Mistral | 7B | Yes | No | No |
52+ | Mixtral | 8x7B | Yes | No | No |
53+ | QWen | 1.8B - 72B | Yes | No | No |
54+ | QWen1.5 | 0.5B - 110B | Yes | No | No |
55+ | QWen1.5-MoE | A2.7B | Yes | No | No |
56+ | QWen2 | 0.5B - 72B | Yes | No | No |
57+ | DeepSeek-MoE | 16B | Yes | No | No |
58+ | DeepSeek-V2 | 16B, 236B | Yes | No | No |
59+ | Gemma | 2B-7B | Yes | No | No |
60+ | Dbrx | 132B | Yes | No | No |
61+ | StarCoder2 | 3B-15B | Yes | No | No |
62+ | Phi-3-mini | 3.8B | Yes | No | No |
63+ | CogVLM-Chat | 17B | Yes | No | No |
64+ | CogVLM2-Chat | 19B | Yes | No | No |
65+ | LLaVA(1.5,1.6) | 7B-34B | Yes | No | No |
66+ | InternVL-Chat(v1.5) | 2B-26B | Yes | No | No |
0 commit comments