Skip to content

Feature: Implement Retry Mechanism for Notifications #243

@shahnami

Description

@shahnami

Problem

Currently, when a notification fails to send (e.g., Slack message, Discord message, email, etc.), the system immediately errors out without any retry attempts. This can lead to missed notifications in cases of temporary failures like network issues or rate limiting.

Current Behavior

Looking at the notification services in src/services/notification/, notifications fail immediately on any error without retry attempts. This is less resilient compared to our RPC endpoint management system which has sophisticated retry and rotation mechanisms.

Proposed Solution

Implement a retry mechanism for notifications similar to the RPC endpoint management system, with the following components:

  1. Retry Policy Configuration

    • Configurable number of retry attempts
    • Exponential backoff strategy
    • Configurable retry conditions (e.g., network errors, rate limits)
  2. Notification Manager

    • Similar to EndpointManager but for notifications
    • Handles retry logic and backoff
    • Manages notification-specific error handling
  3. Retry Strategy

    • Define which errors are retryable
    • Handle rate limiting specifically
    • Log retry attempts and failures

Retryable Error Types (Example)

pub enum NotificationError {
    RateLimitError { retry_after: Duration },
    NetworkError,
    TemporaryError,
    PermanentError,
}

fn is_retryable_error(error: &NotificationError) -> bool {
    matches!(error, 
        NotificationError::RateLimitError { .. } |
        NotificationError::NetworkError |
        NotificationError::TemporaryError
    )
}

Integration Points

  1. Existing Notification Services

    • Slack (slack.rs)
    • Discord (discord.rs)
    • Email (email.rs)
    • Webhook (webhook.rs)
    • Telegram (telegram.rs)
    • Script (script.rs)
  2. Error Handling

    • Update error.rs to include retry-specific error types
    • Add retry-related logging

Acceptance Criteria

  • Implement NotificationManager with retry logic
  • Add configuration for retry attempts and backoff strategy
  • Implement retryable error types and detection
  • Add retry support for all notification services
  • Add comprehensive logging for retry attempts
  • Add metrics for notification success/failure rates
  • Add documentation for retry behavior
  • Add tests covering retry scenarios

References

  • Current RPC retry implementation in endpoint_manager.rs
  • Current notification services in src/services/notification/

Additional Considerations

  • Consider implementing different retry strategies for different notification types
  • Consider adding circuit breaker pattern for failing notification services
  • Consider implementing notification queuing for high-load scenarios

Metadata

Metadata

Assignees

Labels

A-notifsSlack, Email, or other notification methodsD-hardComplex or advanced issuesP-lowLow-priority or non-urgent tasksT-featureSuggests a new feature or enhancementcla: allowlist

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions