Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling
A production guide to scaling Spring Boot microservices on Amazon EKS using Terraform for infrastructure and Horizontal Pod Autoscaling for elastic capacity — with metrics, cost tips and CI/CD integration.
Introduction
Spring Boot microservices need to scale elastically in production — CPU spikes, traffic bursts and background jobs all create uneven load. Amazon EKS + Terraform + the Horizontal Pod Autoscaler (HPA) is the canonical way to do this in AWS.
This tutorial takes a Spring Boot service from container image to auto-scaled production deployment.
Kubernetes scaling fundamentals
Kubernetes has three levels of scaling:
1. HPA — adds/removes Pods based on metrics (CPU, memory, custom). 2. Cluster Autoscaler / Karpenter — adds/removes Nodes when Pods can't be scheduled. 3. VPA — recommends or applies new CPU/memory *requests* per Pod.
Use HPA for traffic, Cluster Autoscaler for capacity, VPA for right-sizing.
Step 1 — Containerize the Spring Boot app
```dockerfile
# Multi-stage, small image
FROM eclipse-temurin:21-jdk AS build
WORKDIR /src
COPY . .
RUN ./mvnw -q -DskipTests packageFROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /src/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java","-XX:+UseContainerSupport","-XX:MaxRAMPercentage=75","-jar","/app/app.jar"] ```
UseContainerSupport + MaxRAMPercentage is critical — without it the JVM ignores cgroup limits and OOM-kills.
Step 2 — Terraform: VPC + EKS
```hcl
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = "ml-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a","us-east-1b","us-east-1c"]
private_subnets = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
public_subnets = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
}module "eks" { source = "terraform-aws-modules/eks/aws" cluster_name = "ml-prod" cluster_version = "1.30" subnet_ids = module.vpc.private_subnets vpc_id = module.vpc.vpc_id
eks_managed_node_groups = { default = { desired_size = 2 min_size = 2 max_size = 10 instance_types = ["m6i.large"] } } } ```
terraform apply provisions VPC, EKS control plane, node group and IAM in ~15 minutes.
Step 3 — Deploy the service
apiVersion: apps/v1
kind: Deployment
metadata: { name: order-svc }
spec:
replicas: 2
selector: { matchLabels: { app: order-svc } }
template:
metadata: { labels: { app: order-svc } }
spec:
containers:
- name: app
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-svc:1.4.0
ports: [{ containerPort: 8080 }]
resources:
requests: { cpu: "250m", memory: "512Mi" }
limits: { cpu: "1000m", memory: "1Gi" }
readinessProbe:
httpGet: { path: /actuator/health/readiness, port: 8080 }
livenessProbe:
httpGet: { path: /actuator/health/liveness, port: 8080 }
---
apiVersion: v1
kind: Service
metadata: { name: order-svc }
spec:
selector: { app: order-svc }
ports: [{ port: 80, targetPort: 8080 }]
resources.requests is what HPA uses as the 100% baseline — set it realistically.
Step 4 — Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: order-svc }
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-svc
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
- type: Resource
resource: { name: memory, target: { type: Utilization, averageUtilization: 80 } }
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies: [{ type: Percent, value: 50, periodSeconds: 60 }]
scaleUp:
stabilizationWindowSeconds: 0
policies: [{ type: Percent, value: 100, periodSeconds: 30 }]
HPA needs the metrics-server add-on (helm install metrics-server …) before it can read CPU/memory.
Step 5 — Custom metrics with Prometheus
CPU scaling is fine for compute-bound services. For I/O-bound ones, scale on requests per second or queue depth:
metrics:
- type: Pods
pods:
metric: { name: http_requests_per_second }
target: { type: AverageValue, averageValue: "100" }
Wire micrometer-registry-prometheus in Spring Boot, install the Prometheus Adapter, and HPA will scale on real traffic.
Step 6 — Cluster Autoscaler / Karpenter
HPA adds Pods, but if there's no Node capacity they sit Pending. Karpenter watches for unschedulable Pods and provisions the cheapest matching EC2 instance in seconds:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
template:
spec:
requirements:
- { key: karpenter.k8s.aws/instance-category, operator: In, values: ["m","c"] }
- { key: karpenter.k8s.aws/instance-cpu, operator: In, values: ["2","4","8"] }
- { key: kubernetes.io/arch, operator: In, values: ["amd64"] }
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
Cost optimization tips
- Use Spot node pools for stateless workloads — 60–90% cheaper.
- Right-size requests with VPA recommendations after a week of production data.
- Set
scaleDown.stabilizationWindowSeconds: 300so HPA doesn't thrash and trigger Karpenter to churn nodes. - Enable Pod Priority so background jobs are preempted before customer-facing services scale down.
CI/CD integration
A GitHub Actions pipeline that builds, pushes to ECR and rolls out via kubectl:
- run: ./mvnw -B package
- run: docker build -t $ECR/order-svc:$GITHUB_SHA .
- run: aws ecr get-login-password | docker login --username AWS --password-stdin $ECR
- run: docker push $ECR/order-svc:$GITHUB_SHA
- run: kubectl set image deploy/order-svc app=$ECR/order-svc:$GITHUB_SHA
- run: kubectl rollout status deploy/order-svc --timeout=180s
For zero-downtime, pair this with PodDisruptionBudgets and the rolling-update defaults.
Observability
- Metrics — Prometheus + Grafana (HPA dashboard from grafana.com/dashboards/6781).
- Logs — Fluent Bit → CloudWatch Logs or Loki.
- Tracing — OpenTelemetry agent on each Pod → X-Ray or Tempo.
Related tutorials
Architecture
Horizontal Pod Autoscaling on EKS
TL;DR
Key takeaways
- Understand the core concepts behind Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling in a production context.
- Apply the patterns to real Docker & Kubernetes systems, not just toy examples.
- Recognize the trade-offs, failure modes, and operational concerns before adopting them.
- Get a clear path to the next step — related tutorials, tools, and reference architectures.
Avoid these
Common mistakes
1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Docker & Kubernetes patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Docker & Kubernetes systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.
Ship it safely
Production best practices
Apply these before promoting Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling to a real production environment.
Scalability
Design Docker & Kubernetes services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.
Monitoring & Observability
Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.
Logging
Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.
Security
Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.
Testing
Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Docker & Kubernetes systems, also run chaos and load tests before a major release.
Reliability & Rollouts
Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.
Questions
Frequently asked questions
Is this tutorial up to date?
Yes. This tutorial was last reviewed and updated on May 21, 2026. We revisit popular Docker & Kubernetes tutorials regularly to keep them aligned with current best practices.
What level is this tutorial aimed at?
It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.
Do I need to follow every step in order?
The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.
Where can I find the source code?
Code samples are inlined in the tutorial. When a companion repository is published it will be linked at the top of this page.
Go deeper
Further reading
More From the Channel
Follow the full tutorial series on YouTube
The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.
Stay in the Loop
Get the next tutorial in your inbox
next tutorial →
Automating Database Migrations with Flyway and Spring Boot in a CI/CD Pipeline
Related tutorials
Dockerizing a Spring Boot Application: The Right Way
Build small, fast and secure Docker images for Spring Boot using multi-stage builds, layered jars and JVM container tuning.
Kubernetes Basics for Java Developers
Everything a backend developer needs to know about Kubernetes — Pods, Deployments, Services, Ingress and ConfigMaps — explained with a Spring Boot example.
Kafka & ZooKeeper Docker Setup — Quick Deploy Guide
Spin up a local Kafka cluster with ZooKeeper in 60 seconds using docker-compose, ready for Spring Boot integration.
