Skip to content

Commit

Permalink
DOC: update hot topics and fix docs (xorbitsai#584)
Browse files Browse the repository at this point in the history
  • Loading branch information
UranusSeven authored Oct 27, 2023
1 parent 24b0056 commit 919ffaa
Show file tree
Hide file tree
Showing 11 changed files with 62 additions and 28 deletions.
9 changes: 2 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,11 @@ potential of cutting-edge AI models.

## 🔥 Hot Topics
### Framework Enhancements
- Speculative decoding: [#509](https://github.com/xorbitsai/inference/pull/509)
- Support grammar-based sampling for ggml models: [#525](https://github.com/xorbitsai/inference/pull/525)
- Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
- Embedding model support: [#418](https://github.com/xorbitsai/inference/pull/418)
- LoRA support: [#271](https://github.com/xorbitsai/inference/issues/271)
- Multi-GPU support for PyTorch models: [#226](https://github.com/xorbitsai/inference/issues/226)
- Xinference dashboard: [#93](https://github.com/xorbitsai/inference/issues/93)
### New Models
- Built-in support for [internlm-20b](https://huggingface.co/internlm/internlm-20b/commits/main): [#486](https://github.com/xorbitsai/inference/pull/486)
- Built-in support for [internlm-chat-20b](https://huggingface.co/internlm/internlm-chat-20b): [#486](https://github.com/xorbitsai/inference/pull/486)
- Built-in support for [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
- Built-in support for [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and [mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
Expand Down
12 changes: 2 additions & 10 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,14 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
## 🔥 近期热点
### 框架增强
- 支持指定 grammar 输出: [#525](https://github.com/xorbitsai/inference/pull/525)
- 投机采样: [#509](https://github.com/xorbitsai/inference/pull/509)
- 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
- Embedding 模型支持: [#418](https://github.com/xorbitsai/inference/pull/418)
- LoRA 支持: [#271](https://github.com/xorbitsai/inference/issues/271)
- PyTorch 模型多 GPU 支持: [#226](https://github.com/xorbitsai/inference/issues/226)
- Xinference 仪表盘: [#93](https://github.com/xorbitsai/inference/issues/93)
### 新模型
- 内置 [internlm-20b](https://huggingface.co/internlm/internlm-20b/commits/main): [#486](https://github.com/xorbitsai/inference/pull/486)
- 内置 [internlm-chat-20b](https://huggingface.co/internlm/internlm-chat-20b): [#486](https://github.com/xorbitsai/inference/pull/486)
- 内置 [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
- 内置 [mistral-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)[mistral-instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1): [#510](https://github.com/xorbitsai/inference/pull/510)
### 集成
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端,支持 Windows,Mac,以及 Linux。




## 主要功能
🌟 **模型推理,轻而易举**:大语言模型,语音识别模型,多模态模型的部署流程被大大简化。一个命令即可完成模型的部署工作。

Expand Down
Binary file added doc/source/_static/speculative.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed doc/source/_static/speculative_decoding.gif
Binary file not shown.
4 changes: 2 additions & 2 deletions doc/source/examples/chatbot.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. _examples_chatbot:

=======================
========================
Example: CLI chatbot 🤖️
=======================
========================

**Description**:

Expand Down
4 changes: 2 additions & 2 deletions doc/source/examples/gradio_chatinterface.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. _examples_gradio_chatinterface:

==============================
===============================
Example: Gradio ChatInterface🤗
==============================
===============================

**Description**:

Expand Down
9 changes: 4 additions & 5 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,14 @@ with popular third-party libraries like `LangChain <https://python.langchain.com

Framework Enhancements
~~~~~~~~~~~~~~~~~~~~~~
- Speculative decoding: `#509 <https://github.com/xorbitsai/inference/pull/509>`_
- Support grammar-based sampling for ggml models: `#525 <https://github.com/xorbitsai/inference/pull/525>`_
- Incorporate vLLM: `#445 <https://github.com/xorbitsai/inference/pull/445>`_
- Embedding model support: `#418 <https://github.com/xorbitsai/inference/pull/418>`_
- LoRA support: `#271 <https://github.com/xorbitsai/inference/issues/271>`_
- Multi-GPU support for PyTorch models: `#226 <https://github.com/xorbitsai/inference/issues/226>`_
- Xinference dashboard: `#93 <https://github.com/xorbitsai/inference/issues/93>`_


New Models
~~~~~~~~~~
- Built-in support for `CodeLLama <https://github.com/facebookresearch/codellama>`_: `#414 <https://github.com/xorbitsai/inference/pull/414>`_ `#402 <https://github.com/xorbitsai/inference/pull/402>`_
- Built-in support for `mistral-v0.1 <https://huggingface.co/mistralai/Mistral-7B-v0.1>`_ and `mistral-instruct-v0.1 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1>`_: `#510 <https://github.com/xorbitsai/inference/pull/510>`_


Integrations
Expand Down
4 changes: 4 additions & 0 deletions doc/source/models/builtin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Text Generation Models
- :ref:`Baichuan-2 <models_builtin_baichuan_2>`
- :ref:`Falcon <models_builtin_falcon>`
- :ref:`InternLM <models_builtin_internlm>`
- :ref:`InternLM 20B <models_builtin_internlm_20b>`
- :ref:`Llama-2 <models_builtin_llama_2>`
- :ref:`OPT <models_builtin_opt>`

Expand All @@ -29,6 +30,7 @@ Chat & Instruction-following Models
- :ref:`CodeLlama-Instruct <models_builtin_code_llama_instruct>`
- :ref:`Falcon Instruct <models_builtin_falcon_instruct>`
- :ref:`InternLM Chat <models_builtin_internlm_chat>`
- :ref:`InternLM Chat 20B <models_builtin_internlm_chat_20b>`
- :ref:`InternLM Chat 8K <models_builtin_internlm_chat_8k>`
- :ref:`Llama-2 Chat <models_builtin_llama_2_chat>`
- :ref:`Orca Mini <models_builtin_orca_mini>`
Expand Down Expand Up @@ -73,8 +75,10 @@ Code Assistant Models
falcon-instruct
falcon
internlm
internlm-20b
internlm-chat
internlm-chat-8k
internlm-chat-20b
llama-2-chat
llama-2
openbuddy
Expand Down
23 changes: 23 additions & 0 deletions doc/source/models/builtin/internlm-20b.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _models_builtin_internlm_20b:

==================
InternLM-20B Model
==================

- **Context Length:** 16384
- **Model Name:** internlm-20b
- **Languages:** en, zh
- **Abilities:** generate
- **Description:** Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.

Specifications
^^^^^^^^^^^^^^

Model Spec (pytorch, 20 Billion)
++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 20
- **Quantizations:** 4-bit, 8-bit, none
- **Model ID:** internlm/internlm-20b
- **Model Revision:** f0433b0db933a9adfa169f756ab8547f67ccef1d
22 changes: 22 additions & 0 deletions doc/source/models/builtin/internlm-chat-20b.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _models_builtin_internlm_chat_20b:

=================
InternLM-Chat-20B
=================

- **Context Length:** 16384
- **Model Name:** internlm-chat-20b
- **Languages:** en, zh
- **Abilities:** chat
- **Description:** Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training.

Specifications
^^^^^^^^^^^^^^

Model Spec (pytorch, 20 Billion)
++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 20
- **Quantizations:** 4-bit, 8-bit, none
- **Model ID:** internlm/internlm-chat-20b
3 changes: 1 addition & 2 deletions doc/source/user_guide/spec_decoding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Speculative Decoding (experimental)
===================================

.. image:: ../_static/speculative_decoding.gif
.. image:: ../_static/speculative.gif

Speculative decoding is a method designed to speed up the inference process of large language models (LLMs). This technique involves using a smaller, quicker "draft" model to produce several tokens in advance. These tokens are then checked by a more extensive "target" model. If the larger model confirms the tokens generated by the draft model, it leads to significant savings in memory bandwidth and processing time per token. However, if the tokens from the draft model don't match the predictions of the larger model, they are discarded.

Expand Down Expand Up @@ -56,7 +56,6 @@ The effectiveness of speculative decoding relies on:
- The similarity between the logits produced by the draft model and the target model.

In the example above, the target model is about five times larger than the draft model, and the two models are well aligned. Approximately 86% of the draft tokens are accepted by the target model, resulting in a 25% increase in speed.

References
~~~~~~~~~~
- [1] `Fast Inference from Transformers via Speculative Decoding <https://arxiv.org/abs/2211.17192>`_
Expand Down

0 comments on commit 919ffaa

Please sign in to comment.