-
Notifications
You must be signed in to change notification settings - Fork 34
feat: Introduce centralized retryable HTTP client creation #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o re-use in `try_connect` method
…otation strategies
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #273 +/- ##
=======================================
- Coverage 96.3% 96.3% -0.1%
=======================================
Files 74 75 +1
Lines 24656 24642 -14
=======================================
- Hits 23759 23741 -18
- Misses 897 901 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @isSerge!
I'll checkout the PR and test it locally some time today or early next week.
|
Looks like it's working! 🙌 However, whilst testing on My mainnet network config for testing: I increased the |
|
@shahnami I am trying to reproduce this, but so far my monitor is running without issues for 1hr+ using your configuration. UPDATE: I have it now after running for few hours, investigating |
|
@shahnami seems like the problem was due to two tasks using initial URL and at some point RPC would return 429 then Task 1 would attempt rotation. Previously it would acquire lock and remove second URL from fallbacks, at the same time Task 2 will have 429 as well and it will try rotation, but there are no fallbacks available - modification of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks 👍
I have been trying to reproduce the error, but after 40 mins I could not see errors..
I believe we should try to increase a bit the test coverage, we are going down from 96.3% to 95.7% :)
|
@NicoMolinaOZ coverage decreased due to additional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, but while testing, I'm still seeing some issues with URL rotation. For example, url1 is hitting throttling errors, so we rotate to url2. Since requests are made in parallel, we're still seeing a lot of throttling errors from requests that were made prior to URL rotation (expected). However, at some point, i'm getting "Failed to connect to new URL url1", trying to rotate from current active url2 to url1, and then falls back to url1, even though the error originated from that url. It seems like a race-condition. However, I might be missing something..
2025-06-18T09:08:03.114855Z WARN process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: Request to https://eth.llamarpc.com failed with status 429 Too Many Requests: error code: 1015 network: "ethereum_mainnet"
2025-06-18T09:08:03.114997Z DEBUG process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: send_raw_request: HTTP status 429 Too Many Requests on 'https://eth.llamarpc.com' triggers URL rotation attempt network: "ethereum_mainnet"
2025-06-18T09:08:03.115124Z DEBUG process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: Trying to rotate URL: Current Active: 'https://eth.llamarpc.com', Fallbacks: ["https://eth.drpc.org"] network: "ethereum_mainnet"
2025-06-18T09:08:03.115238Z DEBUG process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: Attempting try_connect to new_url during rotation: 'https://eth.drpc.org' network: "ethereum_mainnet"
2025-06-18T09:08:03.115406Z DEBUG process_new_blocks:get_blocks: reqwest::connect: starting new connection: https://eth.drpc.org/ network: "ethereum_mainnet"
...
2025-06-18T09:08:05.835129Z ERROR process_new_blocks:get_blocks: openzeppelin_monitor::utils::logging::error: Error occurred, Failed to connect to new URL 'https://eth.llamarpc.com', trace_id: 6bec1588-db5b-49d0-a80f-4daa80043f3c, timestamp: 2025-06-18T09:08:05.835054+00:00, error.chain: Failed to connect to https://eth.llamarpc.com/: 429 network: "ethereum_mainnet"
2025-06-18T09:08:05.835839Z ERROR process_new_blocks:get_blocks: openzeppelin_monitor::utils::logging::error: Error occurred, HTTP error: status 429 Too Many Requests for URL https://eth.llamarpc.com, trace_id: fdd8e800-c902-45c8-8cc5-d50162aa745d, timestamp: 2025-06-18T09:08:05.835820+00:00, error.chain: URL rotation failed: Failed to connect to new URL 'https://eth.llamarpc.com' network: "ethereum_mainnet"
2025-06-18T09:08:05.836480Z DEBUG process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: Trying to rotate URL: Current Active: 'https://eth.drpc.org', Fallbacks: ["https://eth.llamarpc.com"] network: "ethereum_mainnet"
2025-06-18T09:08:05.836993Z DEBUG process_new_blocks:get_blocks: openzeppelin_monitor::services::blockchain::transports::endpoint_manager: Attempting try_connect to new_url during rotation: 'https://eth.llamarpc.com' network: "ethereum_mainnet"
Full logs here
|
Adding new information, I was able to see the same error during testing: I have a monitor which the following filter expression, so we are fetching txs receipts (due to gas_used) |
|
Here is my understanding:
Currently we don't have a mechanism to mark endpoint as problematic globally, so it is being re-used according to current rotation logic. What we can do to mitigate this:
|
shahnami
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, circuit breakers are probably the best solution here. We can tackle this in a new PR. Thanks for your hard work Serge!

Summary
Related to #243
This PR includes following changes:
create_retryable_http_clientutilityHttpTransportClient:try_connectduring rotation is now retryableEndpointManager::set_retry_policysince retry configuration is now being handled at client creation time viaHttpRetryConfig.Testing Process
Checklist