Skip to content

Conversation

@TheOutdoorProgrammer
Copy link
Member

@TheOutdoorProgrammer TheOutdoorProgrammer commented Nov 28, 2025

Summary

Adds validation to prevent worker pool instances from registering when the EC2 metadata service returns null or invalid values for critical instance metadata.

Motivation

In rare cases, the EC2 metadata service can return null, empty, or malformed responses for instance metadata queries. When this happens, worker pool instances would attempt to register with Spacelift using invalid metadata (null instance IDs, AMI IDs, or ASG names), which can cause registration failures, tracking issues, or unexpected behavior in the worker pool management system.

This defensive validation prevents instances from proceeding with the registration process when they receive bad data from the metadata service, failing fast with clear error messages instead of propagating invalid state.

Changes

  • Added validate_metadata() function to both SaaS and self-hosted user data templates
  • Validates instance_id, ami_id, and asg_id after retrieval from EC2 metadata service
  • Checks for empty strings, "null", and "None" values
  • Logs descriptive errors to /var/log/spacelift/error.log when validation fails
  • Prevents launcher from starting when metadata validation fails
  • Bumped module version to 5.4.2

Fixes

  • Prevents worker pool instances from registering with invalid/null metadata when EC2 metadata service returns malformed responses
  • Provides clear error logging for troubleshooting metadata service issues
  • Avoids downstream issues caused by instances running with null or invalid identity metadata

Behavior

When an instance receives invalid metadata from the EC2 metadata service:

  • SaaS template: The spacelift() function returns early (exit code 1), preventing launcher startup
  • Self-hosted template: The launcher script exits immediately with exit code 1
  • Both cases log a descriptive error message indicating which metadata field failed validation

This fail-fast behavior ensures that only instances with valid metadata can register with the worker pool, making it easier to identify and troubleshoot metadata service issues rather than dealing with mysteriously broken instances.


Note

Add EC2 metadata validation to SaaS and self-hosted user data to fail fast on invalid values; bump module version to 5.4.2.

  • User data templates:
    • user_data/saas.tftpl: Add validate_metadata and validate instance_id, ami_id, asg_id; log errors and return 1 to prevent launcher start when invalid.
    • user_data/selfhosted.tftpl: Add validate_metadata in run-launcher.sh; validate instance_id, ami_id, asg_id; log errors and exit 1 on failure.
  • Version:
    • .spacelift/config.yml: Bump module_version to 5.4.2.

Written by Cursor Bugbot for commit a717566. This will update automatically on new commits. Configure here.

@TheOutdoorProgrammer TheOutdoorProgrammer requested a review from a team as a code owner November 28, 2025 15:16
@TheOutdoorProgrammer TheOutdoorProgrammer merged commit 92813e3 into main Nov 28, 2025
16 checks passed
@TheOutdoorProgrammer TheOutdoorProgrammer deleted the shutdown-on-null branch November 28, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants