Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: using discord instead of slack & updating model to qwen2.5 in getting started doc #2775

Merged
merged 4 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
[![PyPI Latest Release](https://img.shields.io/pypi/v/xinference.svg?style=for-the-badge)](https://pypi.org/project/xinference/)
[![License](https://img.shields.io/pypi/l/xinference.svg?style=for-the-badge)](https://github.com/xorbitsai/inference/blob/main/LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/inference/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/inference/goto?ref=main)
[![Discord](https://img.shields.io/badge/join_Discord-5462eb.svg?logo=discord&style=for-the-badge&logoColor=%23f5f5f5)](https://discord.gg/Xw9tszSkr5)
[![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=x&style=for-the-badge)](https://twitter.com/xorbitsio)

Expand All @@ -33,7 +34,7 @@ researcher, developer, or data scientist, Xorbits Inference empowers you to unle
potential of cutting-edge AI models.

<div align="center">
<i><a href="https://join.slack.com/t/xorbitsio/shared_invite/zt-1z3zsm9ep-87yI9YZ_B79HLB2ccTq4WA">👉 Join our Slack community!</a></i>
<i><a href="https://discord.gg/Xw9tszSkr5">👉 Join our Discord community!</a></i>
</div>

## 🔥 Hot Topics
Expand All @@ -47,14 +48,14 @@ potential of cutting-edge AI models.
- Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
- Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
### New Models
- Built-in support for [MeloTTS](https://github.com/myshell-ai/MeloTTS): [#2760](https://github.com/xorbitsai/inference/pull/2760)
- Built-in support for [CogAgent](https://github.com/THUDM/CogAgent): [#2740](https://github.com/xorbitsai/inference/pull/2740)
- Built-in support for [HunyuanVideo](https://github.com/Tencent/HunyuanVideo): [#2721](https://github.com/xorbitsai/inference/pull/2721)
- Built-in support for [HunyuanDiT](https://github.com/Tencent/HunyuanDiT): [#2727](https://github.com/xorbitsai/inference/pull/2727)
- Built-in support for [Macro-o1](https://github.com/AIDC-AI/Marco-o1): [#2749](https://github.com/xorbitsai/inference/pull/2749)
- Built-in support for [Stable Diffusion 3.5](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838): [#2706](https://github.com/xorbitsai/inference/pull/2706)
- Built-in support for [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B): [#2684](https://github.com/xorbitsai/inference/pull/2684)
- Built-in support for [Fish Speech V1.5](https://huggingface.co/fishaudio/fish-speech-1.5): [#2672](https://github.com/xorbitsai/inference/pull/2672)
- Built-in support for [F5-TTS](https://github.com/SWivid/F5-TTS): [#2626](https://github.com/xorbitsai/inference/pull/2626)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
Expand Down
2 changes: 1 addition & 1 deletion README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
- 支持语音识别模型: [#929](https://github.com/xorbitsai/inference/pull/929)
- 增加 Metrics 统计信息: [#906](https://github.com/xorbitsai/inference/pull/906)
### 新模型
- 内置 [MeloTTS](https://github.com/myshell-ai/MeloTTS): [#2760](https://github.com/xorbitsai/inference/pull/2760)
- 内置 [CogAgent](https://github.com/THUDM/CogAgent): [#2740](https://github.com/xorbitsai/inference/pull/2740)
- 内置 [HunyuanVideo](https://github.com/Tencent/HunyuanVideo): [#2721](https://github.com/xorbitsai/inference/pull/2721)
- 内置 [HunyuanDiT](https://github.com/Tencent/HunyuanDiT): [#2727](https://github.com/xorbitsai/inference/pull/2727)
- 内置 [Macro-o1](https://github.com/AIDC-AI/Marco-o1): [#2749](https://github.com/xorbitsai/inference/pull/2749)
- 内置 [Stable Diffusion 3.5](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838): [#2706](https://github.com/xorbitsai/inference/pull/2706)
- 内置 [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B): [#2684](https://github.com/xorbitsai/inference/pull/2684)
- 内置 [Fish Speech V1.5](https://huggingface.co/fishaudio/fish-speech-1.5): [#2672](https://github.com/xorbitsai/inference/pull/2672)
- 内置 [F5-TTS](https://github.com/SWivid/F5-TTS): [#2626](https://github.com/xorbitsai/inference/pull/2626)
### 集成
- [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
Expand Down
6 changes: 3 additions & 3 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@

if version_match != 'zh-cn':
html_theme_options['icon_links'].extend([{
"name": "Slack",
"url": "https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg",
"icon": "fa-brands fa-slack",
"name": "Discord",
"url": "https://discord.gg/Xw9tszSkr5",
"icon": "fa-brands fa-discord",
"type": "fontawesome",
},
{
Expand Down
72 changes: 49 additions & 23 deletions doc/source/getting_started/using_xinference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Using Xinference
Run Xinference Locally
======================

Let's start by running Xinference on a local machine and running a classic LLM model: ``llama-2-chat``.
Let's start by running Xinference on a local machine and running a classic LLM model: ``qwen2.5-instruct``.

After this quickstart, you will move on to learning how to deploy Xinference in a cluster environment.

Expand Down Expand Up @@ -82,14 +82,21 @@ The command line tool is ``xinference``. You can list the commands that can be u
--help Show this message and exit.

Commands:
cached
cal-model-mem
chat
engine
generate
launch
list
login
register
registrations
remove-cache
stop-cluster
terminate
unregister
vllm-models


You can install the Xinference Python client with minimal dependencies using the following command.
Expand All @@ -110,6 +117,7 @@ Currently, xinference supports the following inference engines:
* ``sglang``
* ``llama.cpp``
* ``transformers``
* ``MLX``

About the details of these inference engine, please refer to :ref:`here <inference_backend>`.

Expand Down Expand Up @@ -145,11 +153,31 @@ you need to additionally pass the ``model_engine`` parameter.
You can retrieve information about the supported inference engines and their related parameter combinations
through the ``xinference engine`` command.

.. note::

Here are some recommendations on when to use which engine:

- **Linux**

- When possible, prioritize using **vLLM** or **SGLang** for better performance.
- If resources are limited, consider using **llama.cpp**, as it offers more quantization options.
- For other cases, consider using **Transformers**, which supports nearly all models.

- **Windows**

Run Llama-2
-----------
- It is recommended to use **WSL**, and in this case, follow the same choices as Linux.
- Otherwise, prefer **llama.cpp**, and for unsupported models, opt for **Transformers**.

- **Mac**

- If supported by the model, use the **MLX engine**, as it delivers the best performance.
- For other cases, prefer **llama.cpp**, and for unsupported models, choose **Transformers**.


Run qwen2.5-instruct
--------------------

Let's start by running a built-in model: ``llama-2-chat``. When you start a model for the first time, Xinference will
Let's start by running a built-in model: ``qwen2.5-instruct``. When you start a model for the first time, Xinference will
download the model parameters from HuggingFace, which might take a few minutes depending on the size of the model weights.
We cache the model files locally, so there's no need to redownload them for subsequent starts.

Expand All @@ -163,13 +191,13 @@ We cache the model files locally, so there's no need to redownload them for subs
XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0 --port 9997

We can specify the model's UID using the ``--model-uid`` or ``-u`` flag. If not specified, Xinference will generate a unique ID.
This create a new model instance with unique ID ``my-llama-2``:
The default unique ID will be identical to the model name.

.. tabs::

.. code-tab:: bash shell

xinference launch --model-engine <inference_engine> -u my-llama-2 -n llama-2-chat -s 13 -f pytorch
xinference launch --model-engine <inference_engine> -n qwen2.5-instruct -s 0_5 -f pytorch

.. code-tab:: bash cURL

Expand All @@ -179,10 +207,9 @@ This create a new model instance with unique ID ``my-llama-2``:
-H 'Content-Type: application/json' \
-d '{
"model_engine": "<inference_engine>",
"model_uid": "my-llama-2",
"model_name": "llama-2-chat",
"model_name": "qwen2.5-instruct",
"model_format": "pytorch",
"size_in_billions": 13
"size_in_billions": "0_5"
}'

.. code-tab:: python
Expand All @@ -191,16 +218,15 @@ This create a new model instance with unique ID ``my-llama-2``:
client = RESTfulClient("http://127.0.0.1:9997")
model_uid = client.launch_model(
model_engine="<inference_engine>",
model_uid="my-llama-2",
model_name="llama-2-chat",
model_name="qwen2.5-instruct",
model_format="pytorch",
size_in_billions=13
size_in_billions="0_5"
)
print('Model uid: ' + model_uid)

.. code-tab:: bash output

Model uid: my-llama-2
Model uid: qwen2.5-instruct

.. note::
For some engines, such as vllm, users need to specify the engine-related parameters when
Expand All @@ -209,11 +235,11 @@ This create a new model instance with unique ID ``my-llama-2``:

.. code-block:: bash

xinference launch --model-engine vllm -u my-llama-2 -n llama-2-chat -s 13 -f pytorch --gpu_memory_utilization 0.9
xinference launch --model-engine vllm -n qwen2.5-instruct -s 0_5 -f pytorch --gpu_memory_utilization 0.9

`gpu_memory_utilization=0.9` will pass to vllm when launching model.

Congrats! You now have ``llama-2-chat`` running by Xinference. Once the model is running, we can try it out either via cURL,
Congrats! You now have ``qwen2.5-instruct`` running by Xinference. Once the model is running, we can try it out either via cURL,
or via Xinference's python client:

.. tabs::
Expand All @@ -225,7 +251,7 @@ or via Xinference's python client:
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "my-llama-2",
"model": "qwen2.5-instruct",
"messages": [
{
"role": "system",
Expand All @@ -242,7 +268,7 @@ or via Xinference's python client:

from xinference.client import RESTfulClient
client = RESTfulClient("http://127.0.0.1:9997")
model = client.get_model("my-llama-2")
model = client.get_model("qwen2.5-instruct")
model.chat(
messages=[
{"role": "user", "content": "Who won the world series in 2020?"}
Expand All @@ -253,7 +279,7 @@ or via Xinference's python client:

{
"id": "chatcmpl-8d76b65a-bad0-42ef-912d-4a0533d90d61",
"model": "my-llama-2",
"model": "qwen2.5-instruct",
"object": "chat.completion",
"created": 1688919187,
"choices": [
Expand Down Expand Up @@ -281,7 +307,7 @@ Xinference provides OpenAI-compatible APIs for its supported models, so you can
client = OpenAI(base_url="http://127.0.0.1:9997/v1", api_key="not used actually")

response = client.chat.completions.create(
model="my-llama-2",
model="qwen2.5-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the largest animal?"}
Expand Down Expand Up @@ -345,17 +371,17 @@ When you no longer need a model that is currently running, you can remove it in

.. code-tab:: bash shell

xinference terminate --model-uid "my-llama-2"
xinference terminate --model-uid "qwen2.5-instruct"

.. code-tab:: bash cURL

curl -X DELETE http://127.0.0.1:9997/v1/models/my-llama-2
curl -X DELETE http://127.0.0.1:9997/v1/models/qwen2.5-instruct

.. code-tab:: python

from xinference.client import RESTfulClient
client = RESTfulClient("http://127.0.0.1:9997")
client.terminate_model(model_uid="my-llama-2")
client.terminate_model(model_uid="qwen2.5-instruct")

Deploy Xinference In a Cluster
==============================
Expand Down Expand Up @@ -398,7 +424,7 @@ On each of the other servers where you want to run Xinference workers, run the f

.. code-block:: bash

xinference launch -n llama-2-chat -s 13 -f pytorch -e "http://${supervisor_host}:9997"
xinference launch -n qwen2.5-instruct -s 0_5 -f pytorch -e "http://${supervisor_host}:9997"

Using Xinference With Docker
=============================
Expand Down
8 changes: 4 additions & 4 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,10 +250,10 @@ Getting Involved

:fab:`weixin` Find community on WeChat

.. grid-item-card::
:link: https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg
:fab:`slack` Find community on Slack
.. grid-item-card::
:link: https://discord.gg/Xw9tszSkr5

:fab:`discord` Find community on Discord

.. grid-item-card::
:link: https://github.com/xorbitsai/inference/issues/new/choose
Expand Down
Loading
Loading