Streaming support for openAI's online requests

I am in a situation that I need to issue a large number of request at a reasoning model. The long thinking process can easily cause the requests to timeout, leading to high failure rate.

Upon discussion with the platform engineers, they recommended that using streaming responses can make the extended reasoning requests more stable.

I played with the curator's source code for a bit and add the streaming support. I works quite well. I used to see a lot of timeout errors and now they are gone. 

The proposed changes are here e0c63b9c40d45d60421afb24844842a8d2c411e2, and the main addition is the stream `fetch_response_streamed` function that fetches the streaming response and concatenate them together as if it was came from a non-stream request.

https://github.com/lyuwen/curator/blob/e0c63b9c40d45d60421afb24844842a8d2c411e2/src/bespokelabs/curator/request_processor/online/openai_online_request_processor.py#L71-L121

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming support for openAI's online requests #692

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming support for openAI's online requests #692

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions