Docker & Kubernetes17 min read·By Liyabona Saki·

Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling

A production guide to scaling Spring Boot microservices on Amazon EKS using Terraform for infrastructure and Horizontal Pod Autoscaling for elastic capacity — with metrics, cost tips and CI/CD integration.

Advertisement

Introduction

Spring Boot microservices need to scale elastically in production — CPU spikes, traffic bursts and background jobs all create uneven load. Amazon EKS + Terraform + the Horizontal Pod Autoscaler (HPA) is the canonical way to do this in AWS.

This tutorial takes a Spring Boot service from container image to auto-scaled production deployment.

Kubernetes scaling fundamentals

Kubernetes has three levels of scaling:

1. HPA — adds/removes Pods based on metrics (CPU, memory, custom). 2. Cluster Autoscaler / Karpenter — adds/removes Nodes when Pods can't be scheduled. 3. VPA — recommends or applies new CPU/memory *requests* per Pod.

Use HPA for traffic, Cluster Autoscaler for capacity, VPA for right-sizing.

Step 1 — Containerize the Spring Boot app

```dockerfile
# Multi-stage, small image
FROM eclipse-temurin:21-jdk AS build
WORKDIR /src
COPY . .
RUN ./mvnw -q -DskipTests package

FROM eclipse-temurin:21-jre WORKDIR /app COPY --from=build /src/target/*.jar app.jar EXPOSE 8080 ENTRYPOINT ["java","-XX:+UseContainerSupport","-XX:MaxRAMPercentage=75","-jar","/app/app.jar"] ```

UseContainerSupport + MaxRAMPercentage is critical — without it the JVM ignores cgroup limits and OOM-kills.

Step 2 — Terraform: VPC + EKS

```hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  name    = "ml-vpc"
  cidr    = "10.0.0.0/16"
  azs             = ["us-east-1a","us-east-1b","us-east-1c"]
  private_subnets = ["10.0.1.0/24","10.0.2.0/24","10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24","10.0.102.0/24","10.0.103.0/24"]
  enable_nat_gateway = true
  single_nat_gateway = true
}

module "eks" { source = "terraform-aws-modules/eks/aws" cluster_name = "ml-prod" cluster_version = "1.30" subnet_ids = module.vpc.private_subnets vpc_id = module.vpc.vpc_id

eks_managed_node_groups = { default = { desired_size = 2 min_size = 2 max_size = 10 instance_types = ["m6i.large"] } } } ```

terraform apply provisions VPC, EKS control plane, node group and IAM in ~15 minutes.

Step 3 — Deploy the service

yaml
apiVersion: apps/v1
kind: Deployment
metadata: { name: order-svc }
spec:
  replicas: 2
  selector: { matchLabels: { app: order-svc } }
  template:
    metadata: { labels: { app: order-svc } }
    spec:
      containers:
      - name: app
        image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-svc:1.4.0
        ports: [{ containerPort: 8080 }]
        resources:
          requests: { cpu: "250m", memory: "512Mi" }
          limits:   { cpu: "1000m", memory: "1Gi" }
        readinessProbe:
          httpGet: { path: /actuator/health/readiness, port: 8080 }
        livenessProbe:
          httpGet: { path: /actuator/health/liveness, port: 8080 }
---
apiVersion: v1
kind: Service
metadata: { name: order-svc }
spec:
  selector: { app: order-svc }
  ports: [{ port: 80, targetPort: 8080 }]

resources.requests is what HPA uses as the 100% baseline — set it realistically.

Step 4 — Horizontal Pod Autoscaler

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: order-svc }
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-svc
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource: { name: cpu,    target: { type: Utilization, averageUtilization: 70 } }
    - type: Resource
      resource: { name: memory, target: { type: Utilization, averageUtilization: 80 } }
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies: [{ type: Percent, value: 50, periodSeconds: 60 }]
    scaleUp:
      stabilizationWindowSeconds: 0
      policies: [{ type: Percent, value: 100, periodSeconds: 30 }]

HPA needs the metrics-server add-on (helm install metrics-server …) before it can read CPU/memory.

Step 5 — Custom metrics with Prometheus

CPU scaling is fine for compute-bound services. For I/O-bound ones, scale on requests per second or queue depth:

yaml
metrics:
  - type: Pods
    pods:
      metric: { name: http_requests_per_second }
      target: { type: AverageValue, averageValue: "100" }

Wire micrometer-registry-prometheus in Spring Boot, install the Prometheus Adapter, and HPA will scale on real traffic.

Step 6 — Cluster Autoscaler / Karpenter

HPA adds Pods, but if there's no Node capacity they sit Pending. Karpenter watches for unschedulable Pods and provisions the cheapest matching EC2 instance in seconds:

yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
  template:
    spec:
      requirements:
        - { key: karpenter.k8s.aws/instance-category, operator: In, values: ["m","c"] }
        - { key: karpenter.k8s.aws/instance-cpu,      operator: In, values: ["2","4","8"] }
        - { key: kubernetes.io/arch,                  operator: In, values: ["amd64"] }
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

Cost optimization tips

  • Use Spot node pools for stateless workloads — 60–90% cheaper.
  • Right-size requests with VPA recommendations after a week of production data.
  • Set scaleDown.stabilizationWindowSeconds: 300 so HPA doesn't thrash and trigger Karpenter to churn nodes.
  • Enable Pod Priority so background jobs are preempted before customer-facing services scale down.

CI/CD integration

A GitHub Actions pipeline that builds, pushes to ECR and rolls out via kubectl:

yaml
- run: ./mvnw -B package
- run: docker build -t $ECR/order-svc:$GITHUB_SHA .
- run: aws ecr get-login-password | docker login --username AWS --password-stdin $ECR
- run: docker push $ECR/order-svc:$GITHUB_SHA
- run: kubectl set image deploy/order-svc app=$ECR/order-svc:$GITHUB_SHA
- run: kubectl rollout status deploy/order-svc --timeout=180s

For zero-downtime, pair this with PodDisruptionBudgets and the rolling-update defaults.

Observability

  • Metrics — Prometheus + Grafana (HPA dashboard from grafana.com/dashboards/6781).
  • Logs — Fluent Bit → CloudWatch Logs or Loki.
  • Tracing — OpenTelemetry agent on each Pod → X-Ray or Tempo.

Related tutorials

Architecture

Horizontal Pod Autoscaling on EKS

TRAFFICMETRICSCONTROLLERWORKLOADCLUSTERtrafficmetricsscalepending podsadd / removeAWS ALBMetrics ServerCPU · customHPA Controllermin=2 max=20DeploymentPods (replicas)Cluster AutoscalerEC2 Node Group
The HPA reads CPU / custom metrics and scales the Deployment up or down. Cluster Autoscaler adds nodes when pods cannot be scheduled.

TL;DR

Key takeaways

  • Understand the core concepts behind Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling in a production context.
  • Apply the patterns to real Docker & Kubernetes systems, not just toy examples.
  • Recognize the trade-offs, failure modes, and operational concerns before adopting them.
  • Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

  • 1. Copy-pasting code without understanding the trade-offs

    It's tempting to ship a snippet from a blog post into production, but Docker & Kubernetes patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.

  • 2. Skipping observability from day one

    Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Docker & Kubernetes systems without them is painful and expensive.

  • 3. Optimizing too early

    Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.

  • 4. Ignoring security defaults

    Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Scaling Java Microservices on AWS EKS with Terraform and Horizontal Pod Autoscaling to a real production environment.

Scalability

Design Docker & Kubernetes services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Docker & Kubernetes systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on May 21, 2026. We revisit popular Docker & Kubernetes tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

Code samples are inlined in the tutorial. When a companion repository is published it will be linked at the top of this page.

Go deeper

Further reading

#AWS#EKS#Kubernetes#Terraform#HPA#Spring Boot#Scaling

More From the Channel

Follow the full tutorial series on YouTube

The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.

Stay in the Loop

Get the next tutorial in your inbox

next tutorial →

Automating Database Migrations with Flyway and Spring Boot in a CI/CD Pipeline

Related tutorials