不支持单个请求的abort #1998

bstr9 · 2024-08-01T09:52:00Z

System Info / 系統信息

xinference==v0.13.3

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xinference==v0.13.3

The command used to start Xinference / 用以启动 xinference 的命令

opt/conda/bin/python /opt/conda/bin/xinference-worker --metrics-exporter-port 9998 -e http://10.6.208.95:9997/ -H 10.6.208.95

Reproduction / 复现过程

对于单个请求，如果client主动断开了请求，请求没有正常的被abort而是继续执行完成了。

Expected behavior / 期待表现

当client断开请求后，应该迅速调用engine.abort(request_id)关闭当前的请求，而不浪费GPU推理资源。

qinxuye · 2024-08-01T10:44:03Z

哪个引擎？

bstr9 · 2024-08-02T03:13:08Z

我阅读了xinference的代码，现在支持batching的应该只有Transformer的引擎，它的逻辑里面是把消息丢到Queue里面，然后通过scheduler_actor去管理消息。
如果想要abort之前的request，需要显式的调用abort的http接口，传入request_id才能abort之前的请求。

阅读vllm的openai的接口，它的实现是会监控fastapi request的connection，如果这个connection断开了，它就自己主动engine.abort(request_id)。而不用使用方外部调用abort的接口。

vllm的实现是非常有用处的，
首先：在使用场景下，不是所有客户端都能记住request_id。
其次：如果连接都断开了情况下，engine没有必要再进行推理了，也浪费资源。
最后：vllm的openai兼容的接口是支持batching的，但是xinference里面如果使用vllm的engine，反而不能batching推理了。(虽然这个问题不是这个stack，但是也蛮受到影响的）。[下周我会提交一个PR，修复vllm不支持batching的问题]。

PS: vllm的fastapi接口实现效率应该要比xinference效率高。我觉得可以学习一下。

qinxuye · 2024-08-02T03:23:49Z

目前xinf如果request断开，应该是可以abort的，但是没有开batch的transformers引擎以及llama.cpp引擎可能不支持

XprobeBot added the gpu label Aug 1, 2024

XprobeBot added this to the v0.14.0 milestone Aug 1, 2024

XprobeBot modified the milestones: v0.14, v0.15 Sep 3, 2024

XprobeBot modified the milestones: v0.15, v0.16 Oct 30, 2024

XprobeBot modified the milestones: v0.16, v1.x Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

不支持单个请求的abort #1998

不支持单个请求的abort #1998

bstr9 commented Aug 1, 2024

qinxuye commented Aug 1, 2024

bstr9 commented Aug 2, 2024 •

edited

Loading

qinxuye commented Aug 2, 2024

不支持单个请求的abort #1998

不支持单个请求的abort #1998

Comments

bstr9 commented Aug 1, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Aug 1, 2024

bstr9 commented Aug 2, 2024 • edited Loading

qinxuye commented Aug 2, 2024

bstr9 commented Aug 2, 2024 •

edited

Loading