Feature Request: Way to kill an inference without killing the server #11173

paoletto · 2025-01-10T10:43:22Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Curl'ed inference requests occasionally get stuck, either in llama-server or in whisper-server.
as other requests are queued, killing the server would make every other queued requests fail.
So i wonder: how to kill one of the ongoing tasks (could be multiple if the server is set to use more than 1 concurrent job), without having to actually kill the server, and so letting the queued task move on?

Motivation

Requests occasionally get stuck and block the server

Possible Implementation

additional API endpoint taking task id?

paoletto · 2025-01-10T11:35:37Z

could be duplicate of #6421, although a dedicated way to kill current jobs i believe could have merit on its own

ngxson · 2025-01-10T11:37:24Z

I mentioned the same problem a while ago here: #9273

Will revisit it in a few days.

paoletto added the enhancement New feature or request label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Way to kill an inference without killing the server #11173

Feature Request: Way to kill an inference without killing the server #11173

paoletto commented Jan 10, 2025

paoletto commented Jan 10, 2025 •

edited

Loading

ngxson commented Jan 10, 2025

Feature Request: Way to kill an inference without killing the server #11173

Feature Request: Way to kill an inference without killing the server #11173

Comments

paoletto commented Jan 10, 2025

Prerequisites

Feature Description

Motivation

Possible Implementation

paoletto commented Jan 10, 2025 • edited Loading

ngxson commented Jan 10, 2025

paoletto commented Jan 10, 2025 •

edited

Loading