-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Labels
curator-onlineRelated online request processorRelated online request processor
Description
I am in a situation that I need to issue a large number of request at a reasoning model. The long thinking process can easily cause the requests to timeout, leading to high failure rate.
Upon discussion with the platform engineers, they recommended that using streaming responses can make the extended reasoning requests more stable.
I played with the curator's source code for a bit and add the streaming support. I works quite well. I used to see a lot of timeout errors and now they are gone.
The proposed changes are here e0c63b9, and the main addition is the stream fetch_response_streamed function that fetches the streaming response and concatenate them together as if it was came from a non-stream request.
Metadata
Metadata
Assignees
Labels
curator-onlineRelated online request processorRelated online request processor