Blue-Green Deployments with Kubernetes — Zero Downtime Releases
A complete production guide to blue-green deployments on Kubernetes — Deployments, Services, Ingress, traffic switching, instant rollback, database compatibility and the full release process.
Introduction
Rolling updates are Kubernetes' default deployment strategy, and for many workloads they are good enough. But during the rollout, a percentage of users are on the new version and a percentage are on the old one — for minutes, sometimes longer. Rollback means another rolling update in reverse, with the same mixed state in the middle. For systems that demand instant cutovers and instant rollbacks, blue-green deployment is the right tool.
In a blue-green deployment, two identical environments run in parallel. One is live ("blue"); the other holds the new version ("green"). A load balancer points to one color at a time. Promotion is a single atomic operation: flip the selector. Rollback is the same flip in reverse — milliseconds, not minutes. This tutorial is a complete production walkthrough of blue-green on Kubernetes with Deployments, Services, Ingress and a database compatibility strategy.
When to use blue-green vs rolling
| Concern | Rolling Update | Blue-Green | |---------------------|--------------------------|-----------------------------| | Cutover time | Minutes | Instant | | Rollback time | Minutes | Instant | | Resource cost | ~1× during rollout | 2× during release | | Mixed-version state | Yes, during rollout | No (atomic switch) | | Schema migrations | Hard (mixed versions) | Easier (one version at once)| | Stateful workloads | Built-in support | Needs care |
Use blue-green when you cannot tolerate mixed versions (e.g. breaking API changes), when you need a sub-second cutover, or when you want a warm rollback target sitting idle. Stick with rolling for routine stateless backend updates.
Architecture
Traditional Rolling / In-Place Deployment
Blue-green architecture on Kubernetes
The simplest implementation uses three pieces:
- Two Deployments (
app-blueandapp-green) with identical pod spec but different image tags. - A single Service whose selector picks one color at a time.
- An Ingress that points at the Service.
Architecture
Blue-Green Deployment on Kubernetes
The user-facing hostname never changes. Promotion is a one-line patch to the Service selector.
Real-world use cases
- Breaking API releases where every client must move to the new version simultaneously.
- Database migration windows where the application requires schema v2 and you cannot serve v1 once v2 is live.
- Regulated workloads (banking, healthcare) where exact release moments must be auditable.
- High-traffic e-commerce sites that cannot tolerate any user landing on a half-deployed page.
- Internal platform services where the cost of two environments is small relative to the cost of a bad release.
Step 1 — Define the blue Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 4
selector:
matchLabels:
app: api
color: blue
template:
metadata:
labels:
app: api
color: blue
spec:
containers:
- name: api
image: registry.example.com/api:1.4.0
ports: [{ containerPort: 8080 }]
readinessProbe:
httpGet: { path: /healthz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }
Step 2 — Define the green Deployment
Identical apart from the color label and the image tag.
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 4
selector:
matchLabels:
app: api
color: green
template:
metadata:
labels:
app: api
color: green
spec:
containers:
- name: api
image: registry.example.com/api:1.5.0
# ... same as blue
Step 3 — The Service selector is the switch
apiVersion: v1
kind: Service
metadata:
name: api
spec:
selector:
app: api
color: blue # <-- the only line that changes during a release
ports:
- port: 80
targetPort: 8080
Step 4 — Ingress (unchanged across releases)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port: { number: 80 }
Step 5 — The release process
The repeatable script:
```bash
# 1. Deploy green with the new image
kubectl set image deploy/app-green api=registry.example.com/api:1.5.0
kubectl rollout status deploy/app-green --timeout=120s# 2. Smoke test green via its internal Service or a temporary preview host kubectl run smoke --rm -it --image=curlimages/curl -- \ curl -fsSL http://app-green.default.svc.cluster.local:8080/healthz
# 3. Flip the Service selector — this is the cutover kubectl patch svc api -p '{"spec":{"selector":{"app":"api","color":"green"}}}'
# 4. Watch error rate and latency for 5 minutes kubectl top pods -l app=api # (your Grafana dashboards do the real watching)
# 5. Keep blue warm in case you need to roll back ```
Architecture
Traffic Switch — Blue → Green Cutover
Step 6 — Rollback
If anything looks wrong during the canary watch window, rollback is a single command:
kubectl patch svc api -p '{"spec":{"selector":{"app":"api","color":"blue"}}}'
That is the entire rollback. The blue pods are already running with the old image; no rebuild, no redeploy, no waiting for image pulls. Once you are confident green is good, you can either delete blue or leave it warm for the next release (then the next deploy reuses blue and the colors alternate).
Step 7 — Automate with Helm or Kustomize
A real pipeline does not run kubectl patch by hand. Wrap it in a Helm template:
# values.yaml
liveColor: blue # patched by CI on promotion
blue:
image: { tag: "1.4.0" }
green:
image: { tag: "1.5.0" }
# service.yaml
spec:
selector:
app: api
color: {{ .Values.liveColor }}
Your CI job:
1. helm upgrade --set green.image.tag=$NEW_TAG — roll out green.
2. Run smoke tests against green.
3. helm upgrade --set liveColor=green — promote.
4. On failure, helm upgrade --set liveColor=blue — instant rollback.
Database compatibility
Blue-green only works if both versions can talk to the same database. The rule is never make a breaking schema change in the same release as the consuming code change.
Use the expand-contract pattern:
1. Expand — add new columns/tables without removing the old. Both blue and green can run. 2. Deploy and promote the new code. It writes to both old and new columns. 3. Backfill data into the new columns. 4. Contract — once the old code is gone, remove the old columns in a follow-up release.
Each migration step is itself a blue-green release. Never combine a destructive migration with a code release.
Production best practices
- Treat both colors as production. Same RBAC, same network policies, same observability. Green is not a staging environment.
- Smoke test against the internal Service. Don't promote until automated tests pass against green by name.
- Warm the JVM / connection pools. Hit green with synthetic traffic before the switch so the first real request doesn't pay cold-start cost.
- Keep the previous color for ≥30 minutes. Most regressions surface in the first half hour.
- Use
maxSurge: 100%, maxUnavailable: 0on the green Deployment. This makes its own rollout instant once the image is pulled. - Wire up alerts on color switch. "Color changed to green at 14:02" should appear in your incident channel automatically.
- Pre-pull the image on every node. A DaemonSet that runs the new image briefly avoids long ImagePulling during release.
- Always make schema migrations backward compatible. This is the single biggest gotcha.
Security considerations
- Apply the same NetworkPolicies to both colors. A green pod isolated by accident equals an outage.
- Don't expose green to the public internet during validation. Use the internal Service hostname.
- Rotate secrets atomically — both colors must read from the same SecretRef.
- Audit who can patch the Service.
kubectl patch svcis the most powerful command in this workflow; gate it via RBAC + a CI service account. - Sign images and verify signatures (Cosign + Kyverno or Gatekeeper).
Common mistakes
1. Breaking schema change shipped with the code change. Once blue talks to v2 schema and green still expects v1, rollback breaks. 2. Reusing the same Deployment name. You need two Deployments; one Deployment cannot be its own rollback target. 3. Forgetting readiness probes on green. Switching traffic to pods that aren't ready causes a brief outage. 4. Promoting before smoke tests pass. The whole point is to validate green before users see it. 5. Deleting blue immediately. No rollback target left — you've turned blue-green into "in-place with extra steps". 6. Color drift in config. Blue and green must use identical ConfigMaps and Secrets; templated charts prevent this. 7. Long-running connections. WebSocket clients on blue won't migrate. Either drain explicitly or accept reconnects during cutover.
Troubleshooting guide
- 502s right after the switch. Green pods are not ready. Check readiness probes and
kubectl get endpoints api. - Service selector flipped but traffic still on blue. Long-lived connections persist; clients must reconnect. Use a Service
sessionAffinity: Noneand short keep-alives. - Green works in smoke tests but fails under load. HPA hasn't scaled it; pre-scale green to match blue before promotion.
- Database errors on green. Check the migration — green expects schema state that isn't fully applied.
- Image pulls slow. Pre-pull on every node, or use a regional registry mirror.
- Slow rollback. Make sure blue is still running. The whole point is to keep it warm.
FAQ
1. How is this different from canary? Canary sends a small percentage of traffic to the new version and gradually increases it. Blue-green is binary — 100% old or 100% new.
2. Can I do blue-green with stateful workloads? With great care. Either share storage between colors (and make sure the new code is compatible) or use a state-handoff procedure. For databases, prefer the expand-contract pattern over swapping the database itself.
3. Does Argo Rollouts support this?
Yes — spec.strategy.blueGreen with activeService and previewService. It automates much of step 5–7 above.
4. Do I need 2× the resources permanently? Only during the release window. After validation you can scale the inactive color to zero (or delete it) and reuse it on the next release.
5. How does this interact with HPA? Apply HPA to both color Deployments. Pre-scale the inactive color before promotion so it can absorb traffic immediately.
6. What about service mesh routing (Istio, Linkerd)?
Cleaner — you split traffic with a VirtualService weight (100/0 or 0/100) and can even ramp gradually. Mesh-based blue-green is the production-grade evolution of selector switching.
7. Can I blue-green an entire namespace?
Yes, with two namespaces (api-blue, api-green) and the Ingress pointing at one or the other. Useful when you want fully isolated test environments.
8. How do I handle config changes that ship with a release? Bake them into the image or into a versioned ConfigMap referenced by the Deployment. Don't mutate live ConfigMaps as part of the switch.
9. What does CI/CD look like end to end?
Build image → push → helm upgrade green → run smoke tests → helm upgrade --set liveColor=green → watch metrics → optionally scale blue to zero.
10. When should I stop using blue-green? When you've moved to a service mesh with weighted traffic splitting — that gives you blue-green, canary and A/B as one mechanism.
Key takeaways
- Blue-green is two identical environments and one selector — that is the whole pattern.
- Cutover and rollback are atomic, sub-second operations because both colors are already running.
- Schema changes must be backward compatible; use expand-contract migrations.
- Automate the promotion in CI with Helm or Argo Rollouts; never
kubectl patchby hand in production. - Keep blue warm after promotion as your instant rollback target.
Related tutorials
Architecture
Kubernetes Deployment Architecture
TL;DR
Key takeaways
- Understand the core concepts behind Blue-Green Deployments with Kubernetes — Zero Downtime Releases in a production context.
- Apply the patterns to real DevOps & CI/CD systems, not just toy examples.
- Recognize the trade-offs, failure modes, and operational concerns before adopting them.
- Get a clear path to the next step — related tutorials, tools, and reference architectures.
Avoid these
Common mistakes
1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but DevOps & CI/CD patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging DevOps & CI/CD systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.
Ship it safely
Production best practices
Apply these before promoting Blue-Green Deployments with Kubernetes — Zero Downtime Releases to a real production environment.
Scalability
Design DevOps & CI/CD services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.
Monitoring & Observability
Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.
Logging
Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.
Security
Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.
Testing
Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For DevOps & CI/CD systems, also run chaos and load tests before a major release.
Reliability & Rollouts
Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.
Questions
Frequently asked questions
Is this tutorial up to date?
Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular DevOps & CI/CD tutorials regularly to keep them aligned with current best practices.
What level is this tutorial aimed at?
It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.
Do I need to follow every step in order?
The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.
Where can I find the source code?
The full source code is available on GitHub: https://github.com/masterlabsystems/blue-green-k8s-demo. Fork it, run it locally, and adapt it to your own project.
Go deeper
Further reading
Source Code
Get the full project on GitHub
More From the Channel
Follow the full tutorial series on YouTube
The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.
Stay in the Loop
Get the next tutorial in your inbox
next tutorial →
Redis Distributed Caching Architecture for High-Traffic APIs
Related tutorials
CI/CD Pipeline with GitHub Actions and Docker
Build a complete CI/CD pipeline that tests, builds and pushes a Spring Boot Docker image on every push using GitHub Actions.
Automating Database Migrations with Flyway and Spring Boot in a CI/CD Pipeline
Ship safe, versioned, zero-downtime database migrations with Flyway and Spring Boot — including PostgreSQL examples, multi-environment handling and a complete GitHub Actions pipeline.
GitOps with ArgoCD — The Modern Kubernetes Deployment Strategy
A complete, production-grade guide to GitOps with ArgoCD on Kubernetes — workflow, architecture, multi-environment promotion, auto-sync, rollbacks and Spring Boot deployments.
