You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there!
First off, thank you for working on this great project :)
Is it possible to add support for Batch APIs, provided by Anthropic and OpenAI?
This feature for their APIs basically allows a "50% discount" for the API calls, in exchange for allowing the responses to take up to 24 hours (so that they can run them when the servers aren't overloaded).
This is useful for saving money on calls that aren't necessarily needed immediately (for example, a request to summarize a book, suggest a design for a software, etc.). Especially when using the more expensive models, like Claude Opus and OpenAI o1, with big context size.
The way it works seems to be that once the request is sent, you can query it to check whether the request is finished or not (so you'll need to query it in interval. For example, every minute. Maybe make it configurable in the settings), and then once the response says it's ready, you can query for the output.
This probably won't be simple, as it requires implementing a new mechanism of waiting for a response (polling) and adding a way to communicate that in the UI (maybe a spinning wheel showing the response hasn't been generated yet), but I do think it will be a great addition that will be very useful.
Plus, since OpenAI introduced it, and now Anthropic followed, we might see more of these available for other APIs (this also means that if implemented, having generic code that will support other similar APIs in the future might be a good idea).
The text was updated successfully, but these errors were encountered:
Following up on this, since the natural sense of using a UI for chatting would be to send and receive a result in a reasonable timeframe - how could one use batching to help with our use case? Sending a request and getting a response of "we will return a response later" to many may be frustrating or useless.
Can you expand with a specific use case in mind? I am not seeing a clear value for users to get responses minutes or possibly hours after request
This feature is not really intended for general chatting use with simple short questions (which are quite cheap anyways), but for more complex ones that include a massive context and might be used with more expensive models (like o1-preview for example), where a 50% discount is quite meaningful and could save a few bucks for a single request.
The specific use-case I'd like to use
A prompt using repopack to add a large codebase (could be 100K+ tokens) as context, and ask the AI in the prompt to generate unit-tests for the whole project, suggest a better architecture, etc.
This type of prompt includes a huge context that will cost a meaningful amount of money (especially for models like o1-preview and o1-mini which are expensive), and I wouldn't mind getting the results a few hours later for saving a decent amount of money.
Another possible case that comes to mind is asking it to write a summary about a topic while adding several relevant books / research papers as context.
Now I don't think this should be for the entire chat, there should be an option for each message (for example, if I have a followup question after getting the result, and I don't want to wait again, I can untick a "batch request" checkbox, and then the next message will be a regular API request without the header settings it as a "batch request").
What would you like to see?
Hey there!
First off, thank you for working on this great project :)
Is it possible to add support for Batch APIs, provided by Anthropic and OpenAI?
This feature for their APIs basically allows a "50% discount" for the API calls, in exchange for allowing the responses to take up to 24 hours (so that they can run them when the servers aren't overloaded).
This is useful for saving money on calls that aren't necessarily needed immediately (for example, a request to summarize a book, suggest a design for a software, etc.). Especially when using the more expensive models, like Claude Opus and OpenAI o1, with big context size.
The way it works seems to be that once the request is sent, you can query it to check whether the request is finished or not (so you'll need to query it in interval. For example, every minute. Maybe make it configurable in the settings), and then once the response says it's ready, you can query for the output.
This probably won't be simple, as it requires implementing a new mechanism of waiting for a response (polling) and adding a way to communicate that in the UI (maybe a spinning wheel showing the response hasn't been generated yet), but I do think it will be a great addition that will be very useful.
Plus, since OpenAI introduced it, and now Anthropic followed, we might see more of these available for other APIs (this also means that if implemented, having generic code that will support other similar APIs in the future might be a good idea).
The text was updated successfully, but these errors were encountered: