Docker & Kubernetes⏱ 17 min read·By Liyabona Saki·Last updated May 21, 2026

Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling

A production guide to scaling Spring Boot microservices on Amazon EKS using Terraform for infrastructure and Horizontal Pod Autoscaling for elastic capacity — with metrics, cost tips and CI/CD integration.

Introduction

Spring Boot microservices need to scale elastically in production — CPU spikes, traffic bursts and background jobs all create uneven load. Amazon EKS + Terraform + the Horizontal Pod Autoscaler (HPA) is the canonical way to do this in AWS.

This tutorial takes a Spring Boot service from container image to auto-scaled production deployment.

Kubernetes scaling fundamentals

Kubernetes has three levels of scaling:

1. HPA — adds/removes Pods based on metrics (CPU, memory, custom). 2. Cluster Autoscaler / Karpenter — adds/removes Nodes when Pods can't be scheduled. 3. VPA — recommends or applies new CPU/memory *requests* per Pod.

Use HPA for traffic, Cluster Autoscaler for capacity, VPA for right-sizing.

Step 1 — Containerize the Spring Boot app

```dockerfile
# Multi-stage, small image
FROM eclipse-temurin:21-jdk AS build
WORKDIR /src
COPY . .
RUN ./mvnw -q -DskipTests package

FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /src/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java","-XX:+UseContainerSupport","-XX:MaxRAMPercentage=75","-jar","/app/app.jar"] ```

UseContainerSupport + MaxRAMPercentage is critical — without it the JVM ignores cgroup limits and OOM-kills.

Step 2 — Terraform: VPC + EKS

```hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  name    = "ml-vpc"
  cidr    = "10.0.0.0/16"
  azs             = ["us-east-1a","us-east-1b","us-east-1c"]
  private_subnets = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

module "eks" { source = "terraform-aws-modules/eks/aws" cluster_name = "ml-prod" cluster_version = "1.30" subnet_ids = module.vpc.private_subnets vpc_id = module.vpc.vpc_id

eks_managed_node_groups = { default = { desired_size = 2 min_size = 2 max_size = 10 instance_types = ["m6i.large"] } } } ```

terraform apply provisions VPC, EKS control plane, node group and IAM in ~15 minutes.

Step 3 — Deploy the service

yaml

apiVersion: apps/v1
kind: Deployment
metadata: { name: order-svc }
spec:
  replicas: 2
  selector: { matchLabels: { app: order-svc } }
  template:
    metadata: { labels: { app: order-svc } }
    spec:
      containers:
      - name: app
        image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-svc:1.4.0
        ports: [{ containerPort: 8080 }]
        resources:
          requests: { cpu: "250m", memory: "512Mi" }
          limits:   { cpu: "1000m", memory: "1Gi" }
        readinessProbe:
          httpGet: { path: /actuator/health/readiness, port: 8080 }
        livenessProbe:
          httpGet: { path: /actuator/health/liveness, port: 8080 }
---
apiVersion: v1
kind: Service
metadata: { name: order-svc }
spec:
  selector: { app: order-svc }
  ports: [{ port: 80, targetPort: 8080 }]

resources.requests is what HPA uses as the 100% baseline — set it realistically.

Step 4 — Horizontal Pod Autoscaler

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: order-svc }
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-svc
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource: { name: cpu,    target: { type: Utilization, averageUtilization: 70 } }
    - type: Resource
      resource: { name: memory, target: { type: Utilization, averageUtilization: 80 } }
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies: [{ type: Percent, value: 50, periodSeconds: 60 }]
    scaleUp:
      stabilizationWindowSeconds: 0
      policies: [{ type: Percent, value: 100, periodSeconds: 30 }]

HPA needs the metrics-server add-on (helm install metrics-server …) before it can read CPU/memory.

Step 5 — Custom metrics with Prometheus

CPU scaling is fine for compute-bound services. For I/O-bound ones, scale on requests per second or queue depth:

yaml

metrics:
  - type: Pods
    pods:
      metric: { name: http_requests_per_second }
      target: { type: AverageValue, averageValue: "100" }

Wire micrometer-registry-prometheus in Spring Boot, install the Prometheus Adapter, and HPA will scale on real traffic.

Step 6 — Cluster Autoscaler / Karpenter

HPA adds Pods, but if there's no Node capacity they sit Pending. Karpenter watches for unschedulable Pods and provisions the cheapest matching EC2 instance in seconds:

yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
  template:
    spec:
      requirements:
        - { key: karpenter.k8s.aws/instance-category, operator: In, values: ["m","c"] }
        - { key: karpenter.k8s.aws/instance-cpu,      operator: In, values: ["2","4","8"] }
        - { key: kubernetes.io/arch,                  operator: In, values: ["amd64"] }
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

Cost optimization tips

Use Spot node pools for stateless workloads — 60–90% cheaper.
Right-size requests with VPA recommendations after a week of production data.
Set scaleDown.stabilizationWindowSeconds: 300 so HPA doesn't thrash and trigger Karpenter to churn nodes.
Enable Pod Priority so background jobs are preempted before customer-facing services scale down.

CI/CD integration

A GitHub Actions pipeline that builds, pushes to ECR and rolls out via kubectl:

yaml

- run: ./mvnw -B package
- run: docker build -t $ECR/order-svc:$GITHUB_SHA .
- run: aws ecr get-login-password | docker login --username AWS --password-stdin $ECR
- run: docker push $ECR/order-svc:$GITHUB_SHA
- run: kubectl set image deploy/order-svc app=$ECR/order-svc:$GITHUB_SHA
- run: kubectl rollout status deploy/order-svc --timeout=180s

For zero-downtime, pair this with PodDisruptionBudgets and the rolling-update defaults.

Observability

Metrics — Prometheus + Grafana (HPA dashboard from grafana.com/dashboards/6781).
Logs — Fluent Bit → CloudWatch Logs or Loki.
Tracing — OpenTelemetry agent on each Pod → X-Ray or Tempo.

Key takeaways

Understand the core concepts behind Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling in a production context.
Apply the patterns to real Docker & Kubernetes systems, not just toy examples.
Recognize the trade-offs, failure modes, and operational concerns before adopting them.
Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Docker & Kubernetes patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Docker & Kubernetes systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling to a real production environment.

Scalability

Design Docker & Kubernetes services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Docker & Kubernetes systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on May 21, 2026. We revisit popular Docker & Kubernetes tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

Code samples are inlined in the tutorial. When a companion repository is published it will be linked at the top of this page.

Go deeper

Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling

Introduction

Kubernetes scaling fundamentals

Step 1 — Containerize the Spring Boot app

Step 2 — Terraform: VPC + EKS

Step 3 — Deploy the service

Step 4 — Horizontal Pod Autoscaler

Step 5 — Custom metrics with Prometheus

Step 6 — Cluster Autoscaler / Karpenter

Cost optimization tips

CI/CD integration

Observability

Related tutorials

Horizontal Pod Autoscaling on EKS

Key takeaways

Common mistakes

Production best practices

Frequently asked questions

Further reading

Follow the full tutorial series on YouTube

Get the next tutorial in your inbox

Related tutorials

Dockerizing a Spring Boot Application: The Right Way

Kubernetes Basics for Java Developers

Kafka & ZooKeeper Docker Setup — Quick Deploy Guide