fix: retry on anthropic overloaded #356

ErikBjare · 2024-12-20T11:13:25Z

Attempt at fixing #345

Untested since it's hard to reproduce.

Important

Adds retry mechanisms with exponential backoff for handling Anthropic API overload errors in llm_anthropic.py.

Behavior:
- Adds retry_on_overloaded and retry_generator_on_overloaded decorators in llm_anthropic.py for retrying on APIStatusError with status code 503 using exponential backoff.
- Applies retry_on_overloaded to chat() and retry_generator_on_overloaded to stream() in llm_anthropic.py to handle Anthropic API overloads.
Imports:
- Adds time and wraps imports in llm_anthropic.py for implementing retry logic.

^{This description was created by}^{for 5b44061. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 915bd15 in 1 minute and 24 seconds

More details

Looked at 63 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. gptme/llm/llm_anthropic.py:32

Draft comment:
The retry_generator_on_overloaded decorator does not handle partial results from the generator function. Consider implementing logic to handle cases where the generator yields some results before an error occurs.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises a valid technical point - the retry mechanism does restart the generator from scratch. However, in this specific context, this is being used for the Anthropic API streaming which is idempotent. The stream() function yields chunks of a response, and restarting from the beginning on a 503 error is actually the desired behavior since we want the complete response. The comment is technically correct but not practically relevant for this use case.
I could be wrong about the idempotency assumptions. There might be edge cases where partial results need to be preserved that I haven't considered.
The code is specifically handling Anthropic API streaming where restarting the stream from the beginning on a 503 error is appropriate. The current implementation matches the needs of this specific use case.
While technically valid, the comment raises a concern that isn't relevant for this specific API streaming use case where restarting from the beginning is appropriate. The comment should be removed.

Workflow ID: wflow_UlhJbe5RFIlwSlI6

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

codecov-commenter · 2024-12-20T11:16:43Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
149	1	148	3

View the top 1 failed tests by shortest run time

tests.test_server::test_api_conversation_generate

Stack Traces | 0.536s run time

conv = 'test-server-220921', client = <FlaskClient <Flask 'gptme.server.api'>>

    @pytest.mark.slow
    def test_api_conversation_generate(conv: str, client: FlaskClient):
        # Ask the assistant to generate a test response
        response = client.post(
            f"/api/conversations/{conv}",
            json={"role": "user", "content": "hello, just testing"},
        )
        assert response.status_code == 200
    
        # Test regular (non-streaming) response
        response = client.post(
            f"/api/conversations/{conv}/generate",
            json={"model": get_model().model, "stream": False},
        )
        assert response.status_code == 200
        data = response.get_data(as_text=True)
        assert data  # Ensure we got some response
        msgs_resps = response.get_json()
        assert msgs_resps is not None  # Ensure we got valid JSON
        # Assistant message + possible tool output
        assert len(msgs_resps) >= 1
    
        # First message should be the assistant's response
>       assert msgs_resps[0]["role"] == "assistant"
E       KeyError: 0

tests/test_server.py:88: KeyError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

ErikBjare · 2024-12-21T15:30:00Z

@gptme do you think this will fix #345? read the stacktrace and see if we handle it appropriately

github-actions · 2024-12-21T15:30:51Z

Based on the code changes and the description provided, this pull request has a good chance of addressing the issue described in #345. Here's an analysis of the changes:

The new retry_generator_on_overloaded decorator is specifically designed to handle Anthropic API overload errors (status code 503), which appears to be the core issue in Anthropic API Overloaded leads to crash #345.
The implementation uses an exponential backoff strategy, which is a good practice for handling temporary service unavailability. This gives the API time to recover between retry attempts.
The retry mechanism is applied to the stream() function, which should help mitigate issues with API overload during streaming operations.

However, there are a few points to consider:

As mentioned in the PR description, this fix is untested since the issue is hard to reproduce. While the implementation looks solid, there's still a possibility that it might not fully resolve the problem in all scenarios.
The Codecov report shows that there are 5 missing lines in the new code. It would be beneficial to add tests to cover these lines and ensure the retry mechanism works as expected.
The fix only applies to the stream() function. If the issue occurs in other parts of the Anthropic API interaction, those might need similar treatment.

To fully confirm if this fixes #345, it would be helpful to:

Review the original stacktrace from issue Anthropic API Overloaded leads to crash #345 to ensure it matches the error being handled.
Test the changes in an environment where the error can be reproduced, if possible.
Add unit tests to cover the new retry mechanism.

Despite these considerations, the implemented solution is a step in the right direction and should help mitigate issues with Anthropic API overload errors. It's a good improvement that addresses the described problem with a well-thought-out retry strategy.

ErikBjare · 2024-12-21T15:43:51Z

@gptme could you also add the decorator to def chat? if you think that is also needed

github-actions · 2024-12-21T15:44:47Z

Thank you for the suggestion. I've added the retry_on_overloaded decorator to the chat function as well. Here's a summary of the changes made:

Created a new retry_on_overloaded decorator for non-generator functions, which implements the same retry logic with exponential backoff as the retry_generator_on_overloaded decorator.
Applied the retry_on_overloaded decorator to the chat function.
Kept the existing retry_generator_on_overloaded decorator on the stream function.

These changes should provide more robust handling of Anthropic API overload errors for both streaming and non-streaming API calls. This comprehensive approach should help address the issue described in #345 more effectively.

To further improve the solution, consider:

Adding unit tests to verify the behavior of both decorators.
Monitoring the effectiveness of this change in production to ensure it resolves the issue.
Considering adding similar retry mechanisms to other API calls if needed.

These changes should significantly improve the reliability of interactions with the Anthropic API, especially during periods of high load or temporary service issues.

github-actions · 2024-12-21T15:45:38Z

I'm sorry, I could not fulfill your request. Please check the log of this run for more information.

Conversation log

ellipsis-dev

👍 Looks good to me! Incremental review on 5b44061 in 53 seconds

More details

Looked at 52 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. gptme/llm/llm_anthropic.py:50

Draft comment:
The retry_on_overloaded decorator is missing a return statement after the retry loop. If the function does not raise an exception but also does not return a value within the loop, it will return None. Consider adding a return statement after the loop to handle this case.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests adding a return statement after the retry loop, but this is incorrect. The function will either return the wrapped function's result via 'return func(*args, **kwargs)', raise an exception on the last retry, or continue retrying. There is no path where it silently returns None.
Could there be some edge case I'm missing? What if the function being wrapped returns None intentionally?
Even if the wrapped function returns None, that's handled by the 'return func(*args, **kwargs)' line - it would properly return that None value. The decorator is transparent to the return value.
The comment is incorrect - the decorator properly handles all return paths and cannot silently return None. The comment should be deleted.

Workflow ID: wflow_OwAm6CzR60tFPIKe

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ErikBjare · 2024-12-21T15:59:20Z

A single failing test in

anthropic.NotFoundError: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: gpt-o4'}}
...
=========================== short test summary info ============================
FAILED tests/test_server.py::test_api_conversation_generate - KeyError: 0
============= 1 failed, 147 passed, 1 skipped, 1 retried in 50.18s =============

and for the openai case:

openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-o4` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

wtf? where is it getting "gpt-o4" from?!

ErikBjare · 2024-12-21T15:59:55Z

@gptme why does it happen? i find no references to "o4" in the codebase or anything?

github-actions · 2024-12-21T16:00:25Z

I'm sorry, I could not fulfill your request. Please check the log of this run for more information.

ErikBjare · 2024-12-21T16:08:04Z

Fixed in dec7a60

fix: retry on anthropic overloaded

915bd15

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

fix(anthropic): also retry on non-streaming call

5b44061

ellipsis-dev bot reviewed Dec 21, 2024

View reviewed changes

ErikBjare merged commit 5076376 into master Dec 21, 2024
7 checks passed

ErikBjare mentioned this pull request Dec 22, 2024

Anthropic API Overloaded leads to crash #345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: retry on anthropic overloaded #356

fix: retry on anthropic overloaded #356

Uh oh!

ErikBjare commented Dec 20, 2024 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

codecov-commenter commented Dec 20, 2024 •

edited

Loading

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

Uh oh!

Uh oh!

fix: retry on anthropic overloaded #356

fix: retry on anthropic overloaded #356

Uh oh!

Conversation

ErikBjare commented Dec 20, 2024 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 1 Tests Failed:

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

github-actions bot commented Dec 21, 2024

Uh oh!

ErikBjare commented Dec 21, 2024

Uh oh!

Uh oh!

Uh oh!

ErikBjare commented Dec 20, 2024 •

edited by ellipsis-dev bot

Loading

codecov-commenter commented Dec 20, 2024 •

edited

Loading