Skip to content

Streaming support for openAI's online requests #692

@lyuwen

Description

@lyuwen

I am in a situation that I need to issue a large number of request at a reasoning model. The long thinking process can easily cause the requests to timeout, leading to high failure rate.

Upon discussion with the platform engineers, they recommended that using streaming responses can make the extended reasoning requests more stable.

I played with the curator's source code for a bit and add the streaming support. I works quite well. I used to see a lot of timeout errors and now they are gone.

The proposed changes are here e0c63b9, and the main addition is the stream fetch_response_streamed function that fetches the streaming response and concatenate them together as if it was came from a non-stream request.

https://github.com/lyuwen/curator/blob/e0c63b9c40d45d60421afb24844842a8d2c411e2/src/bespokelabs/curator/request_processor/online/openai_online_request_processor.py#L71-L121

Metadata

Metadata

Assignees

Labels

curator-onlineRelated online request processor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions