Amazon EKS reference architecture - BonData Documentation

A Cloud-Prem deployment provisions an Amazon Elastic Kubernetes Service (EKS) cluster inside the customer’s AWS account. This page describes the cluster, networking, storage, and IAM configuration BonData provisions during onboarding, the same shape the customer’s security and platform teams will see when they review the Terraform module before applying it. The multi-tenant Cloud SaaS uses the same EKS configuration internally, but only the Cloud-Prem customer sees it in their own account.

Cluster

The cluster is provisioned with Terraform using the official terraform-aws-modules/eks module. AWS manages the control plane; BonData manages the cluster configuration, node groups, and the workloads inside the cluster. The control plane is private. The cluster is upgraded to a supported Kubernetes minor version on a regular cadence and never runs a version past AWS’s end-of-standard-support date for that release. IAM Roles for Service Accounts (IRSA) is enabled cluster-wide. Every BonData service that needs to call an AWS API runs as a Kubernetes service account bound to a scoped IAM role, there are no long-lived AWS access keys mounted in pods, in the cluster, or in CI.

Add-ons

The following EKS managed add-ons are installed:

AWS VPC CNI: pod networking. Configured with prefix delegation for IP-address density.
AWS EBS CSI driver: persistent volumes backed by Amazon EBS.
AWS EFS CSI driver: shared filesystems backed by Amazon EFS, used where pods need shared-read semantics.
CoreDNS: in-cluster DNS resolution.
kube-proxy: Service routing on each node.

The EBS and EFS CSI drivers run with IRSA-bound IAM roles scoped to the actions each driver requires.

Node groups

Workloads run on EKS managed node groups with separate node groups per workload class, for example, general application workloads, observability workloads, and stateful workloads are isolated on their own node groups via Kubernetes taints and labels. This keeps a noisy or failing workload class from affecting the rest of the cluster. All worker nodes use:

AWS Graviton (ARM64) instances: AL2023_ARM_64_STANDARD AMI family.
On-demand capacity for predictable performance.
Encrypted EBS root volumes (gp3), with KMS encryption and delete_on_termination enabled.
IMDSv2 required: token-bound instance metadata access; IMDSv1 is disabled.

Each node group is sized independently and can scale within its declared min/max bounds.

Networking

The cluster lives in a dedicated VPC with three availability zones.

Private subnets host all workloads and the EKS control plane endpoints. Three subnets, one per AZ.
Public subnets host the NAT gateway and the Application Load Balancers that terminate inbound TLS. Three subnets, one per AZ.
A separate set of larger subnets is reserved for the VPC CNI’s secondary ENIs (prefix delegation), so the cluster can host a high pod density without exhausting the primary subnet ranges.
The public and private subnets are tagged for ELB discovery so the AWS Load Balancer Controller can attach internal and internet-facing load balancers to the right tier.

Outbound traffic exits via the NAT gateway. No public IP is assigned to any worker node. The cluster does not accept inbound traffic from the public internet directly, public ingress arrives at Cloudflare first and is forwarded to the ALBs inside the VPC.

Load balancing and DNS

The AWS Load Balancer Controller runs in the cluster (with an IRSA-bound IAM role) and provisions Application Load Balancers from Kubernetes Ingress resources. TLS certificates are issued by AWS Certificate Manager and attached to the ALB listeners. external-dns keeps Route 53 records aligned with in-cluster ingress definitions.

Storage

Stateful pods get persistent volumes from the AWS EBS CSI driver: encrypted gp3 volumes attached to the same AZ as the pod. Where pods need shared-read filesystem semantics, the AWS EFS CSI driver provides cross-AZ shared filesystems. EBS-backed StatefulSets are pinned to the AZ where their persistent volumes live, so a rolling node replacement brings the pod back where its volume can attach.

Identity, secrets, and observability inside the cluster

Identity to AWS is via IRSA for every service that needs it. Per-service IAM roles, scoped to actions.
Secrets are pulled from AWS Secrets Manager by the External Secrets Operator and exposed as Kubernetes secrets to the pods that need them. See Secrets management.
Logs are forwarded out of the cluster by Fluent Bit. Metrics are scraped by an in-cluster Prometheus stack and forwarded by the New Relic Kubernetes exporter. Errors are captured by Sentry. See Audit logging.

High availability

The cluster spans three availability zones. Amazon EKS manages the Kubernetes control plane with multi-AZ redundancy under AWS’s EKS service-level agreement. A NAT gateway is provisioned in each AZ so an AZ failure does not sever cluster egress. The API tier runs with multiple replicas distributed across AZs behind a cross-zone-balanced Application Load Balancer; unhealthy targets are removed automatically via ALB target health checks. Amazon RDS (the operational database) runs Multi-AZ with synchronous standby replication and automated failover. Amazon MQ (the message broker) runs in a clustered multi-AZ configuration with mirrored durable queues so in-flight messages survive a broker failover.

​Cluster

​Add-ons

​Node groups

​Networking

​Load balancing and DNS

​Storage

​Identity, secrets, and observability inside the cluster

​High availability