DevOps & CI/CD20 min read·By Liyabona Saki·

Blue-Green Deployments with Kubernetes — Zero Downtime Releases

A complete production guide to blue-green deployments on Kubernetes — Deployments, Services, Ingress, traffic switching, instant rollback, database compatibility and the full release process.

Advertisement

Introduction

Rolling updates are Kubernetes' default deployment strategy, and for many workloads they are good enough. But during the rollout, a percentage of users are on the new version and a percentage are on the old one — for minutes, sometimes longer. Rollback means another rolling update in reverse, with the same mixed state in the middle. For systems that demand instant cutovers and instant rollbacks, blue-green deployment is the right tool.

In a blue-green deployment, two identical environments run in parallel. One is live ("blue"); the other holds the new version ("green"). A load balancer points to one color at a time. Promotion is a single atomic operation: flip the selector. Rollback is the same flip in reverse — milliseconds, not minutes. This tutorial is a complete production walkthrough of blue-green on Kubernetes with Deployments, Services, Ingress and a database compatibility strategy.

When to use blue-green vs rolling

| Concern | Rolling Update | Blue-Green | |---------------------|--------------------------|-----------------------------| | Cutover time | Minutes | Instant | | Rollback time | Minutes | Instant | | Resource cost | ~1× during rollout | 2× during release | | Mixed-version state | Yes, during rollout | No (atomic switch) | | Schema migrations | Hard (mixed versions) | Easier (one version at once)| | Stateful workloads | Built-in support | Needs care |

Use blue-green when you cannot tolerate mixed versions (e.g. breaking API changes), when you need a sub-second cutover, or when you want a warm rollback target sitting idle. Stick with rolling for routine stateless backend updates.

Architecture

Traditional Rolling / In-Place Deployment

USERSLOAD BALANCERAPPDATABASEsome trafficsome trafficUsersLoad BalancerApp v1 (draining)App v2 (starting)Shared Database
Updating a single live environment leaves users on a half-deployed system during rollout. Rollback requires another deploy, and any data migration is hard to reverse.

Blue-green architecture on Kubernetes

The simplest implementation uses three pieces:

  • Two Deployments (app-blue and app-green) with identical pod spec but different image tags.
  • A single Service whose selector picks one color at a time.
  • An Ingress that points at the Service.

Architecture

Blue-Green Deployment on Kubernetes

USERSINGRESSSERVICEWORKLOADSDATABASE100% traffic0% (warm)UsersIngressapp.example.comServiceselector: color=blueBlue Deploymentv1.4 · liveGreen Deploymentv1.5 · standbyShared DatabaseBackward compatible schema
Two identical environments (blue and green) run side by side. The Service selector points to one color at a time; switching colors is instant and reversible.

The user-facing hostname never changes. Promotion is a one-line patch to the Service selector.

Real-world use cases

  • Breaking API releases where every client must move to the new version simultaneously.
  • Database migration windows where the application requires schema v2 and you cannot serve v1 once v2 is live.
  • Regulated workloads (banking, healthcare) where exact release moments must be auditable.
  • High-traffic e-commerce sites that cannot tolerate any user landing on a half-deployed page.
  • Internal platform services where the cost of two environments is small relative to the cost of a bad release.

Step 1 — Define the blue Deployment

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api
      color: blue
  template:
    metadata:
      labels:
        app: api
        color: blue
    spec:
      containers:
      - name: api
        image: registry.example.com/api:1.4.0
        ports: [{ containerPort: 8080 }]
        readinessProbe:
          httpGet: { path: /healthz, port: 8080 }
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests: { cpu: 100m, memory: 256Mi }
          limits:   { cpu: 500m, memory: 512Mi }

Step 2 — Define the green Deployment

Identical apart from the color label and the image tag.

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api
      color: green
  template:
    metadata:
      labels:
        app: api
        color: green
    spec:
      containers:
      - name: api
        image: registry.example.com/api:1.5.0
        # ... same as blue

Step 3 — The Service selector is the switch

yaml
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
    color: blue          # <-- the only line that changes during a release
  ports:
  - port: 80
    targetPort: 8080

Step 4 — Ingress (unchanged across releases)

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port: { number: 80 }

Step 5 — The release process

The repeatable script:

```bash
# 1. Deploy green with the new image
kubectl set image deploy/app-green api=registry.example.com/api:1.5.0
kubectl rollout status deploy/app-green --timeout=120s

# 2. Smoke test green via its internal Service or a temporary preview host kubectl run smoke --rm -it --image=curlimages/curl -- \ curl -fsSL http://app-green.default.svc.cluster.local:8080/healthz

# 3. Flip the Service selector — this is the cutover kubectl patch svc api -p '{"spec":{"selector":{"app":"api","color":"green"}}}'

# 4. Watch error rate and latency for 5 minutes kubectl top pods -l app=api # (your Grafana dashboards do the real watching)

# 5. Keep blue warm in case you need to roll back ```

Architecture

Traffic Switch — Blue → Green Cutover

STEP 1STEP 2STEP 3STEP 4123Deploy Greenkubectl applySmoke Testinternal hostnameSwitch Serviceselector: color=greenRollback Readyblue kept warm
Deploy green, run smoke tests against it, flip the Service selector to green, then keep blue warm as an instant rollback target until the release is validated.

Step 6 — Rollback

If anything looks wrong during the canary watch window, rollback is a single command:

bash
kubectl patch svc api -p '{"spec":{"selector":{"app":"api","color":"blue"}}}'

That is the entire rollback. The blue pods are already running with the old image; no rebuild, no redeploy, no waiting for image pulls. Once you are confident green is good, you can either delete blue or leave it warm for the next release (then the next deploy reuses blue and the colors alternate).

Step 7 — Automate with Helm or Kustomize

A real pipeline does not run kubectl patch by hand. Wrap it in a Helm template:

yaml
# values.yaml
liveColor: blue        # patched by CI on promotion
blue:
  image: { tag: "1.4.0" }
green:
  image: { tag: "1.5.0" }
yaml
# service.yaml
spec:
  selector:
    app: api
    color: {{ .Values.liveColor }}

Your CI job: 1. helm upgrade --set green.image.tag=$NEW_TAG — roll out green. 2. Run smoke tests against green. 3. helm upgrade --set liveColor=green — promote. 4. On failure, helm upgrade --set liveColor=blue — instant rollback.

Database compatibility

Blue-green only works if both versions can talk to the same database. The rule is never make a breaking schema change in the same release as the consuming code change.

Use the expand-contract pattern:

1. Expand — add new columns/tables without removing the old. Both blue and green can run. 2. Deploy and promote the new code. It writes to both old and new columns. 3. Backfill data into the new columns. 4. Contract — once the old code is gone, remove the old columns in a follow-up release.

Each migration step is itself a blue-green release. Never combine a destructive migration with a code release.

Production best practices

  • Treat both colors as production. Same RBAC, same network policies, same observability. Green is not a staging environment.
  • Smoke test against the internal Service. Don't promote until automated tests pass against green by name.
  • Warm the JVM / connection pools. Hit green with synthetic traffic before the switch so the first real request doesn't pay cold-start cost.
  • Keep the previous color for ≥30 minutes. Most regressions surface in the first half hour.
  • Use maxSurge: 100%, maxUnavailable: 0 on the green Deployment. This makes its own rollout instant once the image is pulled.
  • Wire up alerts on color switch. "Color changed to green at 14:02" should appear in your incident channel automatically.
  • Pre-pull the image on every node. A DaemonSet that runs the new image briefly avoids long ImagePulling during release.
  • Always make schema migrations backward compatible. This is the single biggest gotcha.

Security considerations

  • Apply the same NetworkPolicies to both colors. A green pod isolated by accident equals an outage.
  • Don't expose green to the public internet during validation. Use the internal Service hostname.
  • Rotate secrets atomically — both colors must read from the same SecretRef.
  • Audit who can patch the Service. kubectl patch svc is the most powerful command in this workflow; gate it via RBAC + a CI service account.
  • Sign images and verify signatures (Cosign + Kyverno or Gatekeeper).

Common mistakes

1. Breaking schema change shipped with the code change. Once blue talks to v2 schema and green still expects v1, rollback breaks. 2. Reusing the same Deployment name. You need two Deployments; one Deployment cannot be its own rollback target. 3. Forgetting readiness probes on green. Switching traffic to pods that aren't ready causes a brief outage. 4. Promoting before smoke tests pass. The whole point is to validate green before users see it. 5. Deleting blue immediately. No rollback target left — you've turned blue-green into "in-place with extra steps". 6. Color drift in config. Blue and green must use identical ConfigMaps and Secrets; templated charts prevent this. 7. Long-running connections. WebSocket clients on blue won't migrate. Either drain explicitly or accept reconnects during cutover.

Troubleshooting guide

  • 502s right after the switch. Green pods are not ready. Check readiness probes and kubectl get endpoints api.
  • Service selector flipped but traffic still on blue. Long-lived connections persist; clients must reconnect. Use a Service sessionAffinity: None and short keep-alives.
  • Green works in smoke tests but fails under load. HPA hasn't scaled it; pre-scale green to match blue before promotion.
  • Database errors on green. Check the migration — green expects schema state that isn't fully applied.
  • Image pulls slow. Pre-pull on every node, or use a regional registry mirror.
  • Slow rollback. Make sure blue is still running. The whole point is to keep it warm.

FAQ

1. How is this different from canary? Canary sends a small percentage of traffic to the new version and gradually increases it. Blue-green is binary — 100% old or 100% new.

2. Can I do blue-green with stateful workloads? With great care. Either share storage between colors (and make sure the new code is compatible) or use a state-handoff procedure. For databases, prefer the expand-contract pattern over swapping the database itself.

3. Does Argo Rollouts support this? Yes — spec.strategy.blueGreen with activeService and previewService. It automates much of step 5–7 above.

4. Do I need 2× the resources permanently? Only during the release window. After validation you can scale the inactive color to zero (or delete it) and reuse it on the next release.

5. How does this interact with HPA? Apply HPA to both color Deployments. Pre-scale the inactive color before promotion so it can absorb traffic immediately.

6. What about service mesh routing (Istio, Linkerd)? Cleaner — you split traffic with a VirtualService weight (100/0 or 0/100) and can even ramp gradually. Mesh-based blue-green is the production-grade evolution of selector switching.

7. Can I blue-green an entire namespace? Yes, with two namespaces (api-blue, api-green) and the Ingress pointing at one or the other. Useful when you want fully isolated test environments.

8. How do I handle config changes that ship with a release? Bake them into the image or into a versioned ConfigMap referenced by the Deployment. Don't mutate live ConfigMaps as part of the switch.

9. What does CI/CD look like end to end? Build image → push → helm upgrade green → run smoke tests → helm upgrade --set liveColor=green → watch metrics → optionally scale blue to zero.

10. When should I stop using blue-green? When you've moved to a service mesh with weighted traffic splitting — that gives you blue-green, canary and A/B as one mechanism.

Key takeaways

  • Blue-green is two identical environments and one selector — that is the whole pattern.
  • Cutover and rollback are atomic, sub-second operations because both colors are already running.
  • Schema changes must be backward compatible; use expand-contract migrations.
  • Automate the promotion in CI with Helm or Argo Rollouts; never kubectl patch by hand in production.
  • Keep blue warm after promotion as your instant rollback target.

Related tutorials

Architecture

Kubernetes Deployment Architecture

CLIENTINGRESSSERVICEPODSDATAHTTPSUsersIngressNGINX / ALBServiceClusterIPPodReplica 1PodReplica 2PodReplica 3PostgreSQLManagedRedisCache
Ingress routes external traffic to a Service that load-balances across replica Pods. Pods read config from ConfigMaps and persist via managed databases.

TL;DR

Key takeaways

  • Understand the core concepts behind Blue-Green Deployments with Kubernetes — Zero Downtime Releases in a production context.
  • Apply the patterns to real DevOps & CI/CD systems, not just toy examples.
  • Recognize the trade-offs, failure modes, and operational concerns before adopting them.
  • Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

  • 1. Copy-pasting code without understanding the trade-offs

    It's tempting to ship a snippet from a blog post into production, but DevOps & CI/CD patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.

  • 2. Skipping observability from day one

    Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging DevOps & CI/CD systems without them is painful and expensive.

  • 3. Optimizing too early

    Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.

  • 4. Ignoring security defaults

    Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Blue-Green Deployments with Kubernetes — Zero Downtime Releases to a real production environment.

Scalability

Design DevOps & CI/CD services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For DevOps & CI/CD systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular DevOps & CI/CD tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

The full source code is available on GitHub: https://github.com/masterlabsystems/blue-green-k8s-demo. Fork it, run it locally, and adapt it to your own project.

Go deeper

Further reading

Source Code

Get the full project on GitHub

View repo →
#Kubernetes#Blue-Green#DevOps#CI/CD#Zero Downtime

More From the Channel

Follow the full tutorial series on YouTube

The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.

Stay in the Loop

Get the next tutorial in your inbox

next tutorial →

Redis Distributed Caching Architecture for High-Traffic APIs

Related tutorials