A process management and state persistence system written in Go. It manages long-running processes, handles state persistence, and implements checkpointing and process recovery.
- Process monitoring and management
- Automatic process restart on failure
- Graceful shutdown with configurable timeouts
- Process status reporting
- System state checkpoint creation and restoration
- SQLite database management with replication
- S3-compatible object storage integration
- HTTP endpoints for system control
- Initial server configuration
- System status reporting
-
Supervisor
- Process lifecycle management
- Process monitoring
- Signal handling
- Configurable restart delays
-
Control Interface
- HTTP server
- Configuration management
- Status reporting
- Checkpoint operations
-
Database Manager
- SQLite database operations
- Replication management
- State persistence
-
Object Storage Integration
- S3-compatible storage operations
- Configurable endpoints
- State backup and restore
./state-manager
The system uses a JSON configuration file with the following structure. The server can run in an unconfigured state and be configured later through the API:
{
"storage": {
"bucket": "your-bucket",
"endpoint": "your-endpoint",
"access_key": "your-access-key",
"secret_key": "your-secret-key",
"region": "your-region",
"key_prefix": "your-prefix",
"env_dir": "your-env-dir"
},
"stacks": ["component1", "component2"]
}
- The server can start in an unconfigured state
- Initial configuration can be applied through the API
- Once configured, the system will persist the configuration
- Configuration changes require server restart
TimeoutStop
: Graceful shutdown timeout (default: 90s)RestartDelay
: Process restart delay (default: 1s)
GET /
: System statusGET /config
: Current configurationPOST /config
: Initial configuration setup (only works on unconfigured server)POST /checkpoint
: Create system checkpointPOST /restore
: Restore from checkpointPOST /release-lease
: Release system lease
-
Process Recovery
- Automatic restart on process exit
- Configurable restart delays
- Process status monitoring
-
State Persistence
- Checkpoint creation
- Database replication
- Object storage backup
-
Shutdown Process
- Configurable shutdown timeouts
- Signal handling
- Process termination
- HTTP interface for system status
- Process health monitoring
- Database replication status
-
Authentication
- Token-based API authentication
- Credential management
-
Data Protection
- Configuration file security
- API communication
-
Configuration
- Environment variable usage
- Configuration backup
- Change documentation
-
Monitoring
- Health checks
- Log monitoring
- Checkpoint tracking
-
Maintenance
- Checkpoint cleanup
- Storage monitoring
- Credential updates
The system implements error handling for:
- Process failures
- Configuration errors
- Storage operations
- Network operations
-
Current Limitations
- Database manager is not checkpointable
- S3-compatible storage only
- Single process supervision
-
Known Issues
- Check documentation for current known issues
- Monitor GitHub issues for updates
- Go 1.22 or later
- Docker
- FUSE support (for JuiceFS)
- AWS CLI (for S3 operations)
go build -o state-manager
docker build -t state-manager .
go test ./...
go test -v ./tests/...
For issues and support:
- Check the documentation
- Review GitHub issues
- Contact the development team
Contributions are welcome:
- Fork the repository
- Create a feature branch
- Submit a pull request
- Follow the contribution guidelines