Skip to content

Track 'intentionally shutdown' state for blue-green deployments #1653

@partouf

Description

@partouf

Problem

Currently, when using the blue-green shutdown command, the system scales the active ASG to 0 but doesn't track that this was an intentional shutdown. From the system's perspective, there's no difference between:

  1. Intentional shutdown (admin ran shutdown command)
  2. Accidental scale-down (someone manually scaled to 0)
  3. System failure (ASG scaled down due to issues)
  4. Cost optimization (temporary shutdown)

This makes it unclear when looking at status whether the environment is intentionally down or if there's a problem.

Proposed Solution

Track shutdown state using SSM parameters or similar mechanism to distinguish between intentional and unintentional downtime.

Option 1: SSM Parameter Tracking

# Set during shutdown
/compiler-explorer/beta/shutdown-state = "intentionally-shutdown"
/compiler-explorer/beta/shutdown-timestamp = "2025-01-03T10:30:00Z"
/compiler-explorer/beta/shutdown-reason = "manual" | "cost-savings" | "maintenance"

Option 2: Enhanced Status Display

$ ce --env beta blue-green status

Blue-Green Status for beta:
Environment State: INTENTIONALLY SHUTDOWN (since 2 hours ago)
Active Color: blue (0 instances - shutdown)
Inactive Color: green (0 instances)

# vs current unclear state:
Active Color: blue 
ASG Status:
  blue (ACTIVE): Desired/Min/Max: 0/0/4  # Is this intentional or a problem?

Benefits

  1. Clear Status - Distinguish between "shut down on purpose" vs "something's wrong"
  2. Operational Clarity - Team knows environment is intentionally down
  3. Automated Checks - Monitoring can ignore "expected down" environments
  4. Audit Trail - Track when/why shutdowns happened
  5. Smart Restart - Commands could behave differently for shutdown environments

Potential Enhanced Commands

# Enhanced shutdown with reason
ce --env beta blue-green shutdown --reason "cost-savings"

# Check if environment is intentionally down
ce --env beta blue-green is-shutdown

# Restart from shutdown state (vs deploy from running state)  
ce --env beta blue-green restart

# Clear shutdown state without scaling up
ce --env beta blue-green clear-shutdown-state

Questions to Consider

  • Should operational monitoring distinguish between intended vs unintended downtime?
  • Should deploy/switch commands behave differently when environment is in "shutdown" state?
  • Do we want audit trails of who shut down environments and when?
  • Is the added complexity worth the operational clarity benefits?

Implementation Notes

  • Could be implemented as an enhancement to the existing shutdown command
  • Would need to update status command to display shutdown state
  • Consider integration with monitoring/alerting systems
  • Should be backward compatible with existing blue-green functionality

Related to recent blue-green deployment implementation in #1649

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions