-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Problem
Currently, when a notification fails to send (e.g., Slack message, Discord message, email, etc.), the system immediately errors out without any retry attempts. This can lead to missed notifications in cases of temporary failures like network issues or rate limiting.
Current Behavior
Looking at the notification services in src/services/notification/, notifications fail immediately on any error without retry attempts. This is less resilient compared to our RPC endpoint management system which has sophisticated retry and rotation mechanisms.
Proposed Solution
Implement a retry mechanism for notifications similar to the RPC endpoint management system, with the following components:
-
Retry Policy Configuration
- Configurable number of retry attempts
- Exponential backoff strategy
- Configurable retry conditions (e.g., network errors, rate limits)
-
Notification Manager
- Similar to
EndpointManagerbut for notifications - Handles retry logic and backoff
- Manages notification-specific error handling
- Similar to
-
Retry Strategy
- Define which errors are retryable
- Handle rate limiting specifically
- Log retry attempts and failures
Retryable Error Types (Example)
pub enum NotificationError {
RateLimitError { retry_after: Duration },
NetworkError,
TemporaryError,
PermanentError,
}
fn is_retryable_error(error: &NotificationError) -> bool {
matches!(error,
NotificationError::RateLimitError { .. } |
NotificationError::NetworkError |
NotificationError::TemporaryError
)
}Integration Points
-
Existing Notification Services
- Slack (
slack.rs) - Discord (
discord.rs) - Email (
email.rs) - Webhook (
webhook.rs) - Telegram (
telegram.rs) - Script (
script.rs)
- Slack (
-
Error Handling
- Update
error.rsto include retry-specific error types - Add retry-related logging
- Update
Acceptance Criteria
- Implement
NotificationManagerwith retry logic - Add configuration for retry attempts and backoff strategy
- Implement retryable error types and detection
- Add retry support for all notification services
- Add comprehensive logging for retry attempts
- Add metrics for notification success/failure rates
- Add documentation for retry behavior
- Add tests covering retry scenarios
References
- Current RPC retry implementation in
endpoint_manager.rs - Current notification services in
src/services/notification/
Additional Considerations
- Consider implementing different retry strategies for different notification types
- Consider adding circuit breaker pattern for failing notification services
- Consider implementing notification queuing for high-load scenarios