VPC subnet types and AZ Choices in CloudPosse Reference Architecture #87
-
While deploying the reference architecture, we ran into some questions regarding the VPC subnet configuration and availability zones:
Thanks in advance for any insights you can provide! |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Yes, you’re correct: the standard VPC component provisions two subnet types: public and private. These are created across three AZs by default. It does not include a distinct third “persistence” subnet type specifically for databases. The typical pattern is to use private subnets for databases.
There isn’t a technical limitation. Rather, it’s a design simplification:
If you introduce a third subnet tier, you’ll need to modify how you pass subnet information into other components. Several components, such as RDS, expect to find and use the private subnet from the VPC component by remote-state reference. However if this is a requirement, it is doable. Let's discuss if this is something that we need to explore further.
The reference architecture uses three AZs primarily to enhance resilience and high availability. Plus the AWS Well-Architected Framework also recommends spreading workloads across three AZs, REL10-BP01 Deploy the workload to multiple locations Benefits:
However, as you noted, there are trade-offs:
If your budget and use case support it, two AZs may suffice especially for non-production environments or if you’re using simpler architectures. But for production, three AZs are considered best practice in AWS for availability SLAs. |
Beta Was this translation helpful? Give feedback.
-
Yes, that's correct. While the Cloud Posse VPC module supports any number of subnet types—including a dedicated We provide a subnet allocation strategy at the organizational level, intended to scale across multiple accounts, regions, and availability zones. Adding a third subnet type per AZ—for example, to isolate databases—would increase the number of subnets by 50%, effectively tripling the subnet count compared to a single-tier private subnet model. This reduces usable IP space and increases management overhead across all environments. |
Beta Was this translation helpful? Give feedback.
-
The module itself supports a third subnet type. However, our reference architecture omits it by default based on a modern cloud-native design rationale. The use of a dedicated "persistence" subnet with restricted NACLs (IMO) is a holdover from legacy network architectures, where segmentation could only be achieved through stateless, IP-based access controls. In AWS, Security Groups provide a superior alternative:
In the context of PCI, properly implemented Security Groups can serve as a compensating control for subnet-level NACL isolation. This approach has been widely accepted in AWS-native PCI environments, assuming you pair it with:
That said, we understand that some assessors still expect subnet-level isolation. If required, we can help you implement a persistence subnet with restricted NACLs—but we consider it an exception-based configuration, not a baseline design. |
Beta Was this translation helpful? Give feedback.
-
You're right that diverging from the reference architecture could introduce downstream considerations. However, there's no need to change modules to accomplish this. Our reference architecture is designed to be layered and composable, so you can diverge where needed without throwing away the kitchen sink—unlike more monolithic approaches. At the top layer, we have Terraform components—which are opinionated implementations of Terraform root modules. These leverage our reusable child modules and provide sensible configurations that work for the most common use cases. This is also the brilliance of vendoring components into your organization: you get the source code checked into your repo, so you can modify it freely. If you do diverge from our baseline, we recommend removing the In this case, our child modules were explicitly designed for this level of flexibility. For example, we decoupled VPC creation from subnet management. You can use the This approach lets you achieve high convergence on the open-source baseline while allowing targeted divergence to satisfy internal requirements—without losing upgrade paths or architectural alignment. Let us know if you'd like guidance wiring this together—we're happy to help. |
Beta Was this translation helpful? Give feedback.
-
We default to three Availability Zones in the reference architecture to promote high availability and fault tolerance. While AWS requires a minimum of two AZs for services like Amazon EKS, using three AZs provides additional benefits:
While AWS doesn’t mandate three AZs, the AWS Well-Architected Framework – Reliability Pillar encourages distributing workloads across multiple AZs to withstand failures. Using three AZs gives a clear majority in quorum-based systems, simplifying the operational model in the event of a zone outage. If you're not using EKS (e.g. etcd) or any quorum-based systems, then this is less likely a consideration. That said, we fully acknowledge the cost trade-offs, especially with NAT Gateways, which incur charges per AZ. For non-production environments (e.g., dev or staging), we often recommend using two AZs to reduce cost while maintaining redundancy. Some teams attempt to reduce costs by using three AZs but only two NAT Gateways, but this creates a single point of failure for outbound traffic in the third AZ, undermining the purpose of a multi-AZ architecture. For teams seeking to reduce NAT Gateway costs while maintaining AZ independence, solutions like FCK NAT can help—but they introduce their own operational trade-offs. Ultimately, it’s a classic trade-off triangle: good, fast, or cheap—pick two. If cost is the primary driver, we can help you make intentional adjustments while highlighting the potential impact on resiliency and fault tolerance. |
Beta Was this translation helpful? Give feedback.
-
TL;DR
Let them know if you want help wiring it together — it's designed to be flexible and vendorable. |
Beta Was this translation helpful? Give feedback.
TL;DR