Setting Up a Robust AWS VPC From Scratch: A Terraform Guide
Dec 15th, 2023Today, we’re diving into the world of netwoking in AWS, particularly focusing on setting up a Virtual Private Cloud (VPC) using Terraform. This guide is aimed at those who are familiar with AWS and are looking to expand their infrastructure using infrastructure as code (IaC).
What is a VPC?
A VPC is a foundational component in AWS. Essentially, it’s a private, isolated space within the AWS cloud infrastructure where you can launch AWS resources, such as virtual servers (EC2 instances), in a network that you have defined. This concept of a VPC revolutionized how businesses and individuals could leverage cloud computing.
The beauty of a VPC lies in its flexibility and control. Users can customize their virtual networking environment to suit their specific needs, including selecting their own IP address range, creating subnets, and configuring route tables and network gateways. This level of control is crucial for various applications, from hosting a simple website to running complex, multi-tier, web-scale applications.
And this is exactly what we are going to do in this post: we’ll delve into creating and configuring a VPC from scratch that you can then tweak to our specific requirements. We’ll walk through the process of setting up subnets, route tables, an internet gateway, a NAT gateway and flow logs. This setup ensures optimal configuration for internet connectivity, correct traffic routing, and enhanced network monitoring.
Key components of our VPC setup
-
Creating the VPC: We start by defining the VPC with a specific CIDR block.
-
Subnets and AZs: AWS recommends distributing resources across Availability Zones (AZs) for resilience. We’ll discuss how to create public and private subnets in different AZs.
-
Route Tables: Setting up route tables for both public and private subnets is crucial. We’ll show how to direct traffic correctly through these tables.
-
Internet Gateway & NAT Gateway: These components are essential for enabling internet access in our VPC. The Internet Gateway connects the public subnet to the internet, while the NAT Gateway allows private subnet instances to access external services.
-
Flow Logs: For monitoring and troubleshooting, we enable flow logs to capture information about the IP traffic in our VPC.
Ready? Let’s go!
Don’t use default networking resources
When you create a new AWS account, a default VPC is automatically set up in each AWS region. This VPC comes with several network resources: default subnets in each Availability Zone, a default route table, a network access control list (ACL), and a default security group. These resources are configured to enable immediate use, allowing you to quickly deploy and run instances.
Key components existing for a default VPC (documentation)
However, it is best practice to not use these default resources for production environments. The main reason is that these defaults are configured with a broad set of permissions and settings aimed at ease of use and initial accessibility. For example, default security groups are generally more permissive.
In Terraform, we are going to “adopt” these existing default resources, reset them, and rename them to align with our specific configurations and naming conventions. This process involves identifying these default resources in Terraform scripts and then applying configurations to meet our requirements.
This approach not only helps in keeping the Terraform state in sync with the AWS environment but also ensures that our infrastructure is set up following our defined best practices.
# Adopt and rename the default VPC
resource "aws_default_vpc" "default" {
tags = { Name = "default-vpc" }
}
# Retrieve available AZ for the current region
data "aws_availability_zones" "all" {}
# For each AZ in the region, adopt and rename the default subnet associated
resource "aws_default_subnet" "default" {
count = length(data.aws_availability_zones.all.names)
availability_zone = data.aws_availability_zones.all.names[count.index]
# ensure that instances launched in this subnet won't have
# a public IP address associated by default
map_public_ip_on_launch = false
tags = { Name = "default-vpc-${data.aws_availability_zones.all.names[count.index]}" }
}
# Adopt and rename the default route table associated with the default VPC
resource "aws_default_route_table" "default" {
default_route_table_id = aws_default_vpc.default.default_route_table_id
tags = { Name = "default-vpc-route-table" }
}
# Adopt and rename the default network access control list (ACL)
# associated with the default VPC
resource "aws_default_network_acl" "default" {
default_network_acl_id = aws_default_vpc.default.default_network_acl_id
tags = { Name = "default-vpc-network-acl" }
// Ignore "subnet_ids" changes to avoid the known issue below.
// https://github.com/hashicorp/terraform/issues/9824
// https://github.com/terraform-providers/terraform-provider-aws/issues/346
lifecycle {
ignore_changes = [subnet_ids]
}
}
# Adopt and rename the default security group associated with the default VPC
# When Terraform first adopts the default security group, it immediately
# removes all ingress and egress rules
resource "aws_default_security_group" "default" {
vpc_id = aws_default_vpc.default.id
tags = { Name = "default-vpc-default-security-group" }
}
Our main VPC
We are now ready to create our own VPC.
We need to define the IP range used by our VPC in the form of a CIDR block. For simplicity and efficient organization, I recommend using 10.0.0.0/8
. This range is widely used for private networks and is ideal for creating a intuitive and structured subnet layout.
resource "aws_vpc" "this" {
cidr_block = "10.0.0.0/16"
tags = { Name = "main-vpc" }
}
It is considered good practice to name all resources created in your infrastructure using tags, as it greatly enhances manageability, clarity, and tracking of resources within a complex cloud environment.
Public and private subnets
With our VPC now in place, the next step is creating subnets. Subnets are subdivisions of a VPC helping to organize and secure our resources by distributing them across different AZs within an AWS Region.
In the context of a multi-account infrastructure (see why it is best practice), we need to share resources across accounts in a common VPC. However, the mapping of AZs (like eu-west-1a
, eu-west-1b
, etc.) is unique to each AWS account. This means that eu-west-1a
in one account may not correspond to the same physical location as eu-west-1a
in another account. Therefore, we will be using the AZ IDs, unique and consistent identifiers for each AZ.
resource "aws_subnet" "main_public1" {
vpc_id = aws_vpc.this.id
cidr_block = "10.0.101.0/24"
availability_zone_id = "euw1-az1"
map_public_ip_on_launch = true
tags = { Name = "main-public1" }
}
resource "aws_subnet" "main_public2" {
vpc_id = aws_vpc.this.id
cidr_block = "10.0.102.0/24"
availability_zone_id = "euw1-az2"
map_public_ip_on_launch = true
tags = { Name = "main-public2" }
}
resource "aws_subnet" "main_private1" {
vpc_id = aws_vpc.this.id
cidr_block = "10.0.1.0/24"
availability_zone_id = "euw1-az1"
map_public_ip_on_launch = false
tags = { Name = "main-private1" }
}
resource "aws_subnet" "main_private2" {
vpc_id = aws_vpc.this.id
cidr_block = "10.0.2.0/24"
availability_zone_id = "euw1-az2"
map_public_ip_on_launch = false
tags = { Name = "main-private2" }
}
Public and private route tables
Route tables are essential components within a VPC, containing rules (routes) that guide network traffic from your subnet or gateway.
Here are some key concepts abour route tables:
-
Implicit router: A VPC has an implicit router, and route tables control the flow of network traffic.
-
Subnet association: Each subnet in the VPC must be associated with a route table, which is known as a subnet route table. This association dictates the routing for that particular subnet. A single route table can be associated with multiple subnets.
-
Main route table: By default, a VPC comes with a main route table. Subnets without an explicit route table association use this main route table. For enhanced security, it’s advised not to use this main route table.
Let’s create two distinct route tables:
-
A public route table: This table is linked to public subnets and will include a route to an Internet Gateway. This setup enables resources within these subnets to access the internet directly.
-
A private route table: This table is linked to private subnets and will include a route to a NAT Gateway. This configuration allows resources in private subnets to reach external services, like software updates, without being directly accessible from the internet.
# Route tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
tags = { Name = "public-route-table" }
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.this.id
tags = { Name = "private-route-table" }
}
# Subnet associations to route tables
resource "aws_route_table_association" "public1" {
route_table_id = aws_route_table.public.id
subnet_id = aws_subnet.main_public1.id
}
resource "aws_route_table_association" "public2" {
route_table_id = aws_route_table.public.id
subnet_id = aws_subnet.main_public2.id
}
resource "aws_route_table_association" "private1" {
route_table_id = aws_route_table.private.id
subnet_id = aws_subnet.main_private1.id
}
resource "aws_route_table_association" "private2" {
route_table_id = aws_route_table.private.id
subnet_id = aws_subnet.main_private2.id
}
Internet Gateway for public subnets
An Internet Gateway (IGW) in AWS is a critical piece that allows communication between instances in your VPC and the internet. After creating the IGW, to enable internet access for the public subnets, we add a specific route to it in their route table.
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = { Name = "main-vpc-igw" }
}
resource "aws_route" "public_internet_gateway" {
route_table_id = aws_route_table.public.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
NAT Gateway for private subnets
A NAT Gateway is a managed service that facilitates instances in a private subnet to initiate outbound traffic to the internet or other AWS services, while preventing the internet from initiating connections with those instances. This service is essential for maintaining the security and privacy of your private subnet resources.
Our approach consists in:
-
Allocating an elastic IP (EIP): This EIP is associated with the NAT Gateway, representing its public-facing IP address. Traffic from your VPC that goes through the NAT Gateway appears to external services as originating from this EIP.
-
Setting up the NAT Gateway: The NAT Gateway is created within a public subnet, using the allocated EIP.
-
Routing trafic to the NAT Gateway: we add a route in the private route table to route the trafic to
0.0.0.0/0
(meaning all internet-bound traffic) to the NAT Gateway.
resource "aws_eip" "nat" {
domain = "vpc"
tags = { Name = "main-vpc-nat" }
}
resource "aws_nat_gateway" "this" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.common_public1.id
tags = { Name = "main-vpc-nat-gw" }
}
resource "aws_route" "private_nat_gateway" {
route_table_id = aws_route_table.private.id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this.id
}
Flow logs
Flow logs capture information about the IP traffic going to and from network interfaces in a VPC, enabling network traffic monitoring and analysis.
It is best practice to set up flow logs: they provide valuable insights into network traffic patterns and potential security issues within a VPC, enhancing network visibility and security management.
Before setting up flow logs, we need a IAM role that have the permissions to publish these logs.
IAM role VPCFlowLogsPublisher
To create this role, we declare:
-
IAM Policy Document: A policy allowing the VPC Flow Logs service to assume an IAM role.
-
IAM Role: A role named VPCFlowLogsPublisher configured with the above policy.
-
Flow Logs Publish Policy: A policy document granting permissions for operations like creating and managing log groups and streams in CloudWatch Logs.
-
Policy Attachment: Attaching the flow logs policy to the IAM role, enabling it to publish logs to CloudWatch.
See more in the AWS documentation.
# Allow the vpc-flow-logs service to assume an IAM role
data "aws_iam_policy_document" "flow_logs_publisher_assume_role_policy" {
statement {
principals {
type = "Service"
identifiers = ["vpc-flow-logs.amazonaws.com"]
}
actions = ["sts:AssumeRole"]
}
}
# Create an IAM role
resource "aws_iam_role" "flow_logs_publisher" {
name = "VPCFlowLogsPublisher"
assume_role_policy = data.aws_iam_policy_document.flow_logs_publisher_assume_role_policy.json
}
# Grant necessary permissions for publishing VPC Flow Logs to CloudWatch
data "aws_iam_policy_document" "flow_logs_publish_policy" {
statement {
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
resources = ["*"]
}
}
# Attach the policy to the role
resource "aws_iam_role_policy" "flow_logs_publish_policy" {
name = "VPCFlowLogsPublishPolicy"
role = aws_iam_role.flow_logs_publisher.id
policy = data.aws_iam_policy_document.flow_logs_publish_policy.json
}
Activate flow logs for our VPC
Next we create a Cloudwatch log group for our flow logs and we activate the flow logs with the IAM role that we just created.
resource "aws_cloudwatch_log_group" "main_vpc_flow_logs" {
name = "main-vpc-flow-logs"
retention_in_days = 365
}
resource "aws_flow_log" "main_vpc_flow_logs" {
log_destination = aws_cloudwatch_log_group.main_vpc_flow_logs.arn
iam_role_arn = aws_iam_role.flow_log_publisher_role.arn
vpc_id = aws_vpc.this.id
traffic_type = "ALL"
}
Activate flow logs for the default VPC
It is also a good idea to activate flow logs for the default VPC. We are not planning to use it, so there should be anything in it. Therefore, if for any reason there is netwok trafic in it, then we want to be able to inspect it!
resource "aws_cloudwatch_log_group" "default_vpc_flow_logs" {
name = "default-vpc-flow-logs"
retention_in_days = 365
}
resource "aws_flow_log" "default_vpc_flow_logs" {
log_destination = aws_cloudwatch_log_group.default_vpc_flow_logs.arn
iam_role_arn = aws_iam_role.flow_logs_publisher.arn
vpc_id = aws_default_vpc.default.id
traffic_type = "ALL"
}