-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add enterprise resilience features and comprehensive documentation #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add timeout enforcement, circuit breaker, metrics collection, and kill switch
with individual focused documentation guides. Fix test stability issues and
add complete XML documentation.
New Packages:
- ExperimentFramework.Resilience (Polly v8 circuit breaker)
- ExperimentFramework.Metrics.Exporters (Prometheus, OpenTelemetry)
Features:
- Timeout enforcement with FallbackToDefault action
- Circuit breaker integration using Polly v8 resilience pipelines
- Metrics collection with Prometheus and OpenTelemetry exporters
- Kill switch for manual experiment/trial shutdown
- In-memory and noop implementations for all providers
Core Implementation (15 new files):
- src/ExperimentFramework/Decorators/TimeoutDecoratorFactory.cs
- src/ExperimentFramework/KillSwitch/IKillSwitchProvider.cs
- src/ExperimentFramework/KillSwitch/KillSwitchDecoratorFactory.cs
- src/ExperimentFramework/Metrics/IExperimentMetrics.cs
- src/ExperimentFramework/Metrics/MetricsDecoratorFactory.cs
- src/ExperimentFramework/Models/TimeoutPolicy.cs
- src/ExperimentFramework/ExperimentBuilderExtensions.cs
- src/ExperimentFramework.Resilience/CircuitBreakerDecoratorFactory.cs
- src/ExperimentFramework.Resilience/CircuitBreakerOptions.cs
- src/ExperimentFramework.Resilience/ResilienceBuilderExtensions.cs
- src/ExperimentFramework.Resilience/ExperimentFramework.Resilience.csproj
- src/ExperimentFramework.Metrics.Exporters/PrometheusExperimentMetrics.cs
- src/ExperimentFramework.Metrics.Exporters/OpenTelemetryExperimentMetrics.cs
- src/ExperimentFramework.Metrics.Exporters/ExperimentFramework.Metrics.Exporters.csproj
Documentation (4 new guides):
- docs/user-guide/timeout-enforcement.md - Prevent slow trials with fallback
- docs/user-guide/circuit-breaker.md - Polly integration and configuration
- docs/user-guide/metrics.md - Prometheus/OpenTelemetry with Grafana queries
- docs/user-guide/kill-switch.md - Manual shutdown with admin API examples
- docs/user-guide/toc.yml - Updated table of contents
README Updates:
- Split monolithic enterprise section into 4 focused sections
- Add concise examples with links to detailed guides
- Update feature list with enterprise capabilities
Test Implementation (2 new files):
- tests/ExperimentFramework.Tests/EnterpriseFeatureTests.cs (17 tests)
- tests/ExperimentFramework.Tests/EnterpriseFeatureDebugTests.cs
Test Fixes:
- Fix flaky telemetry tests with [Collection("TelemetryTests")] attribute
- Add defensive null handling for timing-dependent activity capture
- Fix RuntimeExperimentProxy.cs async Task.FromResult method resolution
- Fix CircuitBreakerDecoratorFactory to use singleton decorator pattern
- Update TelemetryTests.cs and VariantAndTelemetryTests.cs
XML Documentation:
- Add missing docs to TimeoutDecoratorFactory (2 members)
- Add missing docs to IKillSwitchProvider implementations (15 members)
- Add missing docs to KillSwitchDecoratorFactory (4 members)
- Add missing docs to IExperimentMetrics (5 members)
- Add missing docs to MetricsDecoratorFactory (2 members)
- Resolve all 25 CS1591 warnings
Dependencies:
- Add Polly v9.0.0 to ExperimentFramework.Resilience
- Add System.Diagnostics.DiagnosticSource to Metrics.Exporters
- Remove unnecessary package reference (fix NU1510 warning)
Project Files:
- ExperimentFramework.slnx - Add new projects to solution
- tests/ExperimentFramework.Tests/ExperimentFramework.Tests.csproj
- tests/ExperimentFramework.Tests/packages.lock.json
Build Status:
- 0 warnings, 0 errors
- 174 tests passing (100% pass rate)
- All packages restore successfully
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2 +/- ##
=======================================
Coverage ? 69.87%
=======================================
Files ? 65
Lines ? 1736
Branches ? 145
=======================================
Hits ? 1213
Misses ? 523
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive enterprise resilience features to the ExperimentFramework, including timeout enforcement, circuit breaker integration with Polly v8, metrics collection (Prometheus/OpenTelemetry), and kill switch functionality. The changes span 29 new files with focused documentation guides for each feature.
Key additions:
- Four new resilience decorators with independent configuration
- Two new packages: ExperimentFramework.Resilience and ExperimentFramework.Metrics.Exporters
- Comprehensive user documentation with real-world examples
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/ExperimentFramework.Tests/packages.lock.json | Adds Polly 8.5.0 and new project dependencies |
| tests/ExperimentFramework.Tests/VariantAndTelemetryTests.cs | Adds Collection attribute and defensive null checks for parallel test stability |
| tests/ExperimentFramework.Tests/TelemetryTests.cs | Adds Collection attribute for test isolation |
| tests/ExperimentFramework.Tests/ExperimentFramework.Tests.csproj | References new Resilience and Metrics.Exporters projects |
| tests/ExperimentFramework.Tests/EnterpriseFeatureTests.cs | 17 new tests covering timeout, metrics, kill switch, and circuit breaker |
| tests/ExperimentFramework.Tests/EnterpriseFeatureDebugTests.cs | Debug tests for isolating enterprise feature issues |
| src/ExperimentFramework/RuntimeExperimentProxy.cs | Fixes Task.FromResult method resolution for async returns |
| src/ExperimentFramework/Models/TimeoutPolicy.cs | New model defining timeout behavior and actions |
| src/ExperimentFramework/Metrics/MetricsDecoratorFactory.cs | Decorator for collecting experiment performance metrics |
| src/ExperimentFramework/Metrics/IExperimentMetrics.cs | Interface and noop implementation for metrics providers |
| src/ExperimentFramework/KillSwitch/KillSwitchDecoratorFactory.cs | Decorator for emergency experiment/trial shutdown |
| src/ExperimentFramework/KillSwitch/IKillSwitchProvider.cs | Interface and in-memory/noop implementations for kill switch |
| src/ExperimentFramework/ExperimentBuilderExtensions.cs | Fluent API extensions for timeout, metrics, and kill switch |
| src/ExperimentFramework/Decorators/TimeoutDecoratorFactory.cs | Decorator enforcing trial execution timeouts |
| src/ExperimentFramework.Resilience/ResilienceBuilderExtensions.cs | Fluent API extensions for circuit breaker configuration |
| src/ExperimentFramework.Resilience/ExperimentFramework.Resilience.csproj | New package project with Polly 8.5.0 dependency |
| src/ExperimentFramework.Resilience/CircuitBreakerOptions.cs | Configuration options for circuit breaker behavior |
| src/ExperimentFramework.Resilience/CircuitBreakerDecoratorFactory.cs | Polly-based circuit breaker decorator implementation |
| src/ExperimentFramework.Metrics.Exporters/PrometheusExperimentMetrics.cs | In-memory Prometheus text format exporter |
| src/ExperimentFramework.Metrics.Exporters/OpenTelemetryExperimentMetrics.cs | OpenTelemetry metrics integration using System.Diagnostics.Metrics |
| src/ExperimentFramework.Metrics.Exporters/ExperimentFramework.Metrics.Exporters.csproj | New metrics exporters package project |
| docs/user-guide/toc.yml | Adds four new documentation sections to table of contents |
| docs/user-guide/timeout-enforcement.md | Comprehensive guide for timeout configuration with examples |
| docs/user-guide/metrics.md | Guide for Prometheus/OpenTelemetry integration with Grafana queries |
| docs/user-guide/kill-switch.md | Guide for manual experiment control including distributed Redis example |
| docs/user-guide/circuit-breaker.md | Guide for Polly circuit breaker configuration and best practices |
| README.md | Updates feature list and adds four new feature sections with examples |
| ExperimentFramework.slnx | Registers new Resilience and Metrics.Exporters projects in solution |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ExperimentFramework.Resilience/CircuitBreakerDecoratorFactory.cs
Outdated
Show resolved
Hide resolved
src/ExperimentFramework.Resilience/CircuitBreakerDecoratorFactory.cs
Outdated
Show resolved
Hide resolved
src/ExperimentFramework.Metrics.Exporters/PrometheusExperimentMetrics.cs
Outdated
Show resolved
Hide resolved
src/ExperimentFramework.Metrics.Exporters/PrometheusExperimentMetrics.cs
Outdated
Show resolved
Hide resolved
src/ExperimentFramework.Metrics.Exporters/PrometheusExperimentMetrics.cs
Outdated
Show resolved
Hide resolved
- Improved error handling in CircuitBreakerDecoratorFactory for better clarity. - Updated experiment management endpoints to use a registry for allowed experiment types. - Adjusted metrics to reflect changes in tagging for better tracking. - Added smoke tests and unit tests for the new generator functionality.
Code Coverage |
Add timeout enforcement, circuit breaker, metrics collection, and kill switch with individual focused documentation guides.