DOC: update hot topics and fix docs (xorbitsai#584)

aresnow1 · Oct 27, 2023 · 919ffaa · 919ffaa
1 parent 24b0056
commit 919ffaa
Show file tree

Hide file tree

Showing 11 changed files with 62 additions and 28 deletions.
diff --git a/README.md b/README.md
@@ -26,16 +26,11 @@ potential of cutting-edge AI models.
 
 ## 🔥 Hot Topics
 ### Framework Enhancements
+- Speculative decoding: [#509](https://github.com/xorbitsai/inference/pull/509)
 - Support grammar-based sampling for ggml models: [#525](https://github.com/xorbitsai/inference/pull/525)
 - Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
-- Embedding model support: [#418](https://github.com/xorbitsai/inference/pull/418)
-- LoRA support: [#271](https://github.com/xorbitsai/inference/issues/271)
-- Multi-GPU support for PyTorch models: [#226](https://github.com/xorbitsai/inference/issues/226)
-- Xinference dashboard: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### New Models
-- Built-in support for [internlm-20b](https://huggingface.co/internlm/internlm-20b/commits/main): [#486](https://github.com/xorbitsai/inference/pull/486)
-- Built-in support for [internlm-chat-20b](https://huggingface.co/internlm/internlm-chat-20b): [#486](https://github.com/xorbitsai/inference/pull/486)
-- Built-in support for [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
+- Built-in support for [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.

diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -24,22 +24,14 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 ## 🔥 近期热点
 ### 框架增强
 - 支持指定 grammar 输出: [#525](https://github.com/xorbitsai/inference/pull/525)
+- 投机采样: [#509](https://github.com/xorbitsai/inference/pull/509)
 - 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
-- Embedding 模型支持: [#418](https://github.com/xorbitsai/inference/pull/418)
-- LoRA 支持: [#271](https://github.com/xorbitsai/inference/issues/271)
-- PyTorch 模型多 GPU 支持: [#226](https://github.com/xorbitsai/inference/issues/226)
-- Xinference 仪表盘: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### 新模型
-- 内置 [internlm-20b](https://huggingface.co/internlm/internlm-20b/commits/main): [#486](https://github.com/xorbitsai/inference/pull/486)
-- 内置 [internlm-chat-20b](https://huggingface.co/internlm/internlm-chat-20b): [#486](https://github.com/xorbitsai/inference/pull/486)
-- 内置 [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
+- 内置 [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) 与 [mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
 ### 集成
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
 - [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端，支持 Windows，Mac，以及 Linux。
 
-
-
-
 ## 主要功能
 🌟 **模型推理，轻而易举**：大语言模型，语音识别模型，多模态模型的部署流程被大大简化。一个命令即可完成模型的部署工作。 
 

diff --git a/doc/source/_static/speculative.gif b/doc/source/_static/speculative.gif
diff --git a/doc/source/_static/speculative_decoding.gif b/doc/source/_static/speculative_decoding.gif
diff --git a/doc/source/examples/chatbot.rst b/doc/source/examples/chatbot.rst
@@ -1,8 +1,8 @@
 .. _examples_chatbot:
 
-=======================
+========================
 Example: CLI chatbot 🤖️
-=======================
+========================
 
 **Description**:
 

diff --git a/doc/source/examples/gradio_chatinterface.rst b/doc/source/examples/gradio_chatinterface.rst
@@ -1,8 +1,8 @@
 .. _examples_gradio_chatinterface:
 
-==============================
+===============================
 Example: Gradio ChatInterface🤗
-==============================
+===============================
 
 **Description**:
 

diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -43,15 +43,14 @@ with popular third-party libraries like `LangChain <https://python.langchain.com
 
 Framework Enhancements
 ~~~~~~~~~~~~~~~~~~~~~~
+- Speculative decoding: `#509 <https://github.com/xorbitsai/inference/pull/509>`_
+- Support grammar-based sampling for ggml models: `#525 <https://github.com/xorbitsai/inference/pull/525>`_
 - Incorporate vLLM: `#445 <https://github.com/xorbitsai/inference/pull/445>`_
-- Embedding model support: `#418 <https://github.com/xorbitsai/inference/pull/418>`_
-- LoRA support: `#271 <https://github.com/xorbitsai/inference/issues/271>`_
-- Multi-GPU support for PyTorch models: `#226 <https://github.com/xorbitsai/inference/issues/226>`_
-- Xinference dashboard: `#93 <https://github.com/xorbitsai/inference/issues/93>`_
+
 
 New Models
 ~~~~~~~~~~
-- Built-in support for `CodeLLama <https://github.com/facebookresearch/codellama>`_: `#414 <https://github.com/xorbitsai/inference/pull/414>`_ `#402 <https://github.com/xorbitsai/inference/pull/402>`_
+- Built-in support for `mistral-v0.1 <https://huggingface.co/mistralai/Mistral-7B-v0.1>`_ and `mistral-instruct-v0.1 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1>`_: `#510 <https://github.com/xorbitsai/inference/pull/510>`_
 
 
 Integrations

diff --git a/doc/source/models/builtin/index.rst b/doc/source/models/builtin/index.rst
@@ -14,6 +14,7 @@ Text Generation Models
 - :ref:`Baichuan-2 <models_builtin_baichuan_2>`
 - :ref:`Falcon <models_builtin_falcon>`
 - :ref:`InternLM <models_builtin_internlm>`
+- :ref:`InternLM 20B <models_builtin_internlm_20b>`
 - :ref:`Llama-2 <models_builtin_llama_2>`
 - :ref:`OPT <models_builtin_opt>`
 
@@ -29,6 +30,7 @@ Chat & Instruction-following Models
 - :ref:`CodeLlama-Instruct <models_builtin_code_llama_instruct>`
 - :ref:`Falcon Instruct <models_builtin_falcon_instruct>`
 - :ref:`InternLM Chat <models_builtin_internlm_chat>`
+- :ref:`InternLM Chat 20B <models_builtin_internlm_chat_20b>`
 - :ref:`InternLM Chat 8K <models_builtin_internlm_chat_8k>`
 - :ref:`Llama-2 Chat <models_builtin_llama_2_chat>`
 - :ref:`Orca Mini <models_builtin_orca_mini>`
@@ -73,8 +75,10 @@ Code Assistant Models
    falcon-instruct
    falcon
    internlm
+   internlm-20b
    internlm-chat
    internlm-chat-8k
+   internlm-chat-20b
    llama-2-chat
    llama-2
    openbuddy

diff --git a/doc/source/models/builtin/internlm-20b.rst b/doc/source/models/builtin/internlm-20b.rst
@@ -0,0 +1,23 @@
+.. _models_builtin_internlm_20b:
+
+==================
+InternLM-20B Model
+==================
+
+- **Context Length:** 16384
+- **Model Name:** internlm-20b
+- **Languages:** en, zh
+- **Abilities:** generate
+- **Description:** Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.
+
+Specifications
+^^^^^^^^^^^^^^
+
+Model Spec (pytorch, 20 Billion)
+++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 20
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** internlm/internlm-20b
+- **Model Revision:** f0433b0db933a9adfa169f756ab8547f67ccef1d
diff --git a/doc/source/models/builtin/internlm-chat-20b.rst b/doc/source/models/builtin/internlm-chat-20b.rst
@@ -0,0 +1,22 @@
+.. _models_builtin_internlm_chat_20b:
+
+=================
+InternLM-Chat-20B
+=================
+
+- **Context Length:** 16384
+- **Model Name:** internlm-chat-20b
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training.
+
+Specifications
+^^^^^^^^^^^^^^
+
+Model Spec (pytorch, 20 Billion)
+++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 20
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** internlm/internlm-chat-20b
diff --git a/doc/source/user_guide/spec_decoding.rst b/doc/source/user_guide/spec_decoding.rst
@@ -4,7 +4,7 @@
 Speculative Decoding (experimental)
 ===================================
 
-.. image:: ../_static/speculative_decoding.gif
+.. image:: ../_static/speculative.gif
 
 Speculative decoding is a method designed to speed up the inference process of large language models (LLMs). This technique involves using a smaller, quicker "draft" model to produce several tokens in advance. These tokens are then checked by a more extensive "target" model. If the larger model confirms the tokens generated by the draft model, it leads to significant savings in memory bandwidth and processing time per token. However, if the tokens from the draft model don't match the predictions of the larger model, they are discarded.
 
@@ -56,7 +56,6 @@ The effectiveness of speculative decoding relies on:
 - The similarity between the logits produced by the draft model and the target model.
 
 In the example above, the target model is about five times larger than the draft model, and the two models are well aligned. Approximately 86% of the draft tokens are accepted by the target model, resulting in a 25% increase in speed.
-
 References
 ~~~~~~~~~~
 - [1] `Fast Inference from Transformers via Speculative Decoding <https://arxiv.org/abs/2211.17192>`_