Deploy a Vault instance following HashiCorp's best practices. Complete these steps in order:
-
Server Certificates: Prepare certificates first. You can provide yours or use the guide: Public Key Infrastructure (PKI): Requirements.
-
Vault Instance Setup: Start your Vault instance. See Getting Started for instructions.
-
Configure Vault: After setting up the cluster, configure it. Switch to the management directory for PKI, roles, etc.
dev
and ha
(default: dev
). Here are the differences between these modes:
Dev | HA | |
---|---|---|
Number of nodes | 1 | 5 |
Disk type | hdd | ssd |
Vault storage type | file | raft |
Instance type(s) | t3.micro | mixed (lower-price) |
Capacity type | on-demand | spot |
In designing a production environment for HashiCorp Vault, I opted for a balance between performance and reliability. Key architectural decisions include:
-
Raft Protocol for Cluster Reliability: Utilizing the Raft protocol, recognized for its robustness in distributed systems, to ensure cluster reliability in a production environment.
-
Five-Node Cluster Configuration: Following best practices for fault tolerance and availability, this setup significantly reduces the risk of service disruption and is a recommended choice when using the Raft protocol.
-
Ephemeral Node Strategy with SPOT Instances: This approach provides operational flexibility and cost efficiency. Note that we also use multiple instance pools. When a Spot Instance in AWS Auto Scaling is interrupted, the system automatically replaces it with another available instance from a different Spot Instance pool, ensuring continuous operation while optimizing costs.
-
Data Storage on RAID0 Array: Prioritizing performance, RAID0 arrays offer faster data access. The Raft protocol and a robust backup/restore strategy help mitigate the lack of redundancy in RAID0.
-
Vault Auto-Unseal Feature: Configured to accommodate the ephemeral nature of nodes, ensuring minimal downtime and manual intervention.
This architecture balances performance, cost-efficiency, and resilience, embracing the dynamic nature of cloud resources for operational flexibility.
- Keep the Root CA offline.
- Use hardened AMIs, such as those built with this project from @konstruktoid. An Ubuntu AMI from Canonical is used by default.
- Disable SSM once the cluster is operational and an Identity provider is configured.
- Implement MFA for authentication.
Name | Version |
---|---|
terraform | ~> 1.4 |
aws | ~> 5.0 |
cloudinit | ~> 2.3 |
Name | Version |
---|---|
aws | ~> 5.0 |
cloudinit | ~> 2.3 |
Name | Source | Version |
---|---|---|
vault_asg | terraform-aws-modules/autoscaling/aws | ~> 8.0 |
Name | Description | Type | Default | Required |
---|---|---|---|---|
ami_filter | List of maps used to create the AMI filter for the action runner AMI. | map(list(string)) |
{ |
no |
ami_owner | Owner ID of the AMI | string |
"099720109477" |
no |
domain_name | The domain name for which the certificate should be issued | string |
n/a | yes |
env | The environment of the Vault cluster | string |
n/a | yes |
leader_tls_servername | One of the shared DNS SAN used to create the certs use for mTLS | string |
n/a | yes |
mode | Vault cluster mode (default dev, meaning a single node) | string |
"dev" |
no |
name | Name of the resources created for this Vault cluster | string |
"vault" |
no |
prometheus_node_exporter_enabled | If set to true install and start a prometheus node exporter | bool |
false |
no |
region | AWS Region | string |
"eu-west-3" |
no |
ssm_enabled | If true, allow to connect to the instances using AWS Systems Manager | bool |
false |
no |
tags | A map of tags to add to all resources | map(string) |
{} |
no |
vault_data_path | Directory where Vault's data will be stored in an EC2 instance | string |
"/opt/vault/data" |
no |
Name | Description |
---|---|
autoscaling_group_id | n/a |