-
Notifications
You must be signed in to change notification settings - Fork 351
Open
Description
Problem
Currently, when using the blue-green shutdown
command, the system scales the active ASG to 0 but doesn't track that this was an intentional shutdown. From the system's perspective, there's no difference between:
- Intentional shutdown (admin ran
shutdown
command) - Accidental scale-down (someone manually scaled to 0)
- System failure (ASG scaled down due to issues)
- Cost optimization (temporary shutdown)
This makes it unclear when looking at status whether the environment is intentionally down or if there's a problem.
Proposed Solution
Track shutdown state using SSM parameters or similar mechanism to distinguish between intentional and unintentional downtime.
Option 1: SSM Parameter Tracking
# Set during shutdown
/compiler-explorer/beta/shutdown-state = "intentionally-shutdown"
/compiler-explorer/beta/shutdown-timestamp = "2025-01-03T10:30:00Z"
/compiler-explorer/beta/shutdown-reason = "manual" | "cost-savings" | "maintenance"
Option 2: Enhanced Status Display
$ ce --env beta blue-green status
Blue-Green Status for beta:
Environment State: INTENTIONALLY SHUTDOWN (since 2 hours ago)
Active Color: blue (0 instances - shutdown)
Inactive Color: green (0 instances)
# vs current unclear state:
Active Color: blue
ASG Status:
blue (ACTIVE): Desired/Min/Max: 0/0/4 # Is this intentional or a problem?
Benefits
- Clear Status - Distinguish between "shut down on purpose" vs "something's wrong"
- Operational Clarity - Team knows environment is intentionally down
- Automated Checks - Monitoring can ignore "expected down" environments
- Audit Trail - Track when/why shutdowns happened
- Smart Restart - Commands could behave differently for shutdown environments
Potential Enhanced Commands
# Enhanced shutdown with reason
ce --env beta blue-green shutdown --reason "cost-savings"
# Check if environment is intentionally down
ce --env beta blue-green is-shutdown
# Restart from shutdown state (vs deploy from running state)
ce --env beta blue-green restart
# Clear shutdown state without scaling up
ce --env beta blue-green clear-shutdown-state
Questions to Consider
- Should operational monitoring distinguish between intended vs unintended downtime?
- Should deploy/switch commands behave differently when environment is in "shutdown" state?
- Do we want audit trails of who shut down environments and when?
- Is the added complexity worth the operational clarity benefits?
Implementation Notes
- Could be implemented as an enhancement to the existing
shutdown
command - Would need to update
status
command to display shutdown state - Consider integration with monitoring/alerting systems
- Should be backward compatible with existing blue-green functionality
Related to recent blue-green deployment implementation in #1649
Metadata
Metadata
Assignees
Labels
No labels