Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: Add Support to Anthropic & OpenAI Batch APIs #2476

Open
MichaelYochpaz opened this issue Oct 15, 2024 · 2 comments
Open

[FEAT]: Add Support to Anthropic & OpenAI Batch APIs #2476

MichaelYochpaz opened this issue Oct 15, 2024 · 2 comments
Labels
enhancement New feature or request feature request

Comments

@MichaelYochpaz
Copy link

MichaelYochpaz commented Oct 15, 2024

What would you like to see?

Hey there!
First off, thank you for working on this great project :)

Is it possible to add support for Batch APIs, provided by Anthropic and OpenAI?
This feature for their APIs basically allows a "50% discount" for the API calls, in exchange for allowing the responses to take up to 24 hours (so that they can run them when the servers aren't overloaded).

This is useful for saving money on calls that aren't necessarily needed immediately (for example, a request to summarize a book, suggest a design for a software, etc.). Especially when using the more expensive models, like Claude Opus and OpenAI o1, with big context size.

The way it works seems to be that once the request is sent, you can query it to check whether the request is finished or not (so you'll need to query it in interval. For example, every minute. Maybe make it configurable in the settings), and then once the response says it's ready, you can query for the output.

This probably won't be simple, as it requires implementing a new mechanism of waiting for a response (polling) and adding a way to communicate that in the UI (maybe a spinning wheel showing the response hasn't been generated yet), but I do think it will be a great addition that will be very useful.
Plus, since OpenAI introduced it, and now Anthropic followed, we might see more of these available for other APIs (this also means that if implemented, having generic code that will support other similar APIs in the future might be a good idea).

@timothycarambat
Copy link
Member

timothycarambat commented Oct 30, 2024

Following up on this, since the natural sense of using a UI for chatting would be to send and receive a result in a reasonable timeframe - how could one use batching to help with our use case? Sending a request and getting a response of "we will return a response later" to many may be frustrating or useless.

Can you expand with a specific use case in mind? I am not seeing a clear value for users to get responses minutes or possibly hours after request

@MichaelYochpaz
Copy link
Author

Hey @timothycarambat, thank you for responding :)

This feature is not really intended for general chatting use with simple short questions (which are quite cheap anyways), but for more complex ones that include a massive context and might be used with more expensive models (like o1-preview for example), where a 50% discount is quite meaningful and could save a few bucks for a single request.

The specific use-case I'd like to use

  • A prompt using repopack to add a large codebase (could be 100K+ tokens) as context, and ask the AI in the prompt to generate unit-tests for the whole project, suggest a better architecture, etc.

This type of prompt includes a huge context that will cost a meaningful amount of money (especially for models like o1-preview and o1-mini which are expensive), and I wouldn't mind getting the results a few hours later for saving a decent amount of money.

Another possible case that comes to mind is asking it to write a summary about a topic while adding several relevant books / research papers as context.

Now I don't think this should be for the entire chat, there should be an option for each message (for example, if I have a followup question after getting the result, and I don't want to wait again, I can untick a "batch request" checkbox, and then the next message will be a regular API request without the header settings it as a "batch request").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

2 participants