// system design
System Design Fundamentals
The building blocks every backend engineer needs to design systems that survive contact with real traffic — load balancers, caching, CDNs, database scaling, message queues, API gateways, and service discovery.
Quick Reference
- ›Load balancer — L4 (TCP) vs L7 (HTTP); algorithms: RR, least-conn, IP-hash
- ›Cache — in-process (Caffeine), distributed (Redis), CDN edges
- ›CDN — static + edge-cached dynamic; cache keys matter
- ›DB scaling — read replicas, sharding, partitioning, CQRS
- ›Message queue — async, durable, decouples producers/consumers
- ›API gateway — single entrypoint: auth, routing, rate limiting
- ›Service discovery — DNS, Consul, Eureka, Kubernetes Services
Learning Path
Recommended order
- 1.Beginner
- 2.Intermediate
- 3.Advanced
Prerequisites
- •HTTP basics
- •SQL basics
- •What a service is
Skills you will learn
- ✓Designing scalable systems on a whiteboard
- ✓Picking the right caching layer
- ✓Reasoning about consistency vs availability
Estimated time
Multiple weeks of study + practice.
Architecture Overview
Architecture
Microservices Architecture
Load Balancers
Spread traffic across instances.
L4 (TCP) load balancers (NLB, HAProxy) route by IP/port; L7 (HTTP) load balancers (ALB, Nginx, Envoy) route by host, path, headers. Health checks remove unhealthy backends.
Pros
- +Horizontal scale
- +Zero-downtime deploys
- +Health-based routing
Cons
- –Sticky sessions complicate state
- –Add a network hop
Best for: Every production system.
Caching
Remember what you just computed.
Layered: browser → CDN → API gateway → in-process (Caffeine) → distributed (Redis) → database. Each layer cuts latency and load on the next.
Pros
- +Massive latency wins
- +Reduces DB load
Cons
- –Cache invalidation is hard
- –Stale data risk
Best for: Read-heavy workloads.
Content Delivery Network
Push static (and some dynamic) content to the edge.
Cloudflare, CloudFront, Fastly cache assets near users. Modern CDNs also run edge functions for personalization without hitting origin.
Pros
- +Lower latency globally
- +Origin shielding
- +DDoS absorption
Cons
- –Cache invalidation latency
- –Cost on egress for non-cacheable traffic
Best for: Global products with static assets or cacheable APIs.
Database Scaling
Read replicas, sharding, and partitioning.
Vertical scaling buys time; read replicas absorb read load; sharding partitions data by key; partitioning splits large tables. Pick based on access pattern.
Pros
- +Linear read scale with replicas
- +Sharding scales writes
Cons
- –Replication lag
- –Sharding adds cross-shard query complexity
Best for: Any system above ~10k QPS.
Message Queues
Asynchronous, durable communication.
RabbitMQ for traditional queues, Kafka for high-throughput event streams, SQS for managed AWS. Decouples producers from consumers and absorbs traffic spikes.
Pros
- +Decoupling
- +Backpressure absorption
- +Retry semantics
Cons
- –Eventual consistency
- –Ordering guarantees vary
Best for: Background jobs, integration, event-driven flows.
API Gateway
Single entrypoint: auth, routing, rate limiting, observability.
Kong, Spring Cloud Gateway, AWS API Gateway. Centralizes cross-cutting concerns so services stay focused on business logic.
Pros
- +Centralized auth and rate limiting
- +Unified observability
Cons
- –Single point of failure if not HA
- –Can become a deployment bottleneck
Best for: Microservice systems with external consumers.
Service Discovery
How services find each other.
DNS, Consul, Eureka, and Kubernetes Services / Endpoints. Required once you stop hardcoding IPs.
Pros
- +Dynamic scaling
- +Health-aware routing
Cons
- –Adds a control-plane dependency
Best for: Microservice and Kubernetes environments.
Pick the right tool
| Problem | First Reach | Second Reach |
|---|---|---|
| Latency too high (reads) | Add cache (Redis/Caffeine) | Add CDN / read replicas |
| DB writes saturated | Partition / batch | Shard or use CQRS |
| Bursty traffic | Add a queue | Add autoscaling |
| Many cross-service calls | API gateway + caching | Re-architect boundaries |
| Global users | CDN | Multi-region replicas |
Common Mistakes
- !Caching everything — invalidation becomes harder than the original problem.
- !Treating the message queue as a database.
- !Skipping circuit breakers when calling external services.
- !Sharding before exhausting read replicas + partitioning.
Production Tips
- ★Define SLOs first; design backward from latency and error budgets.
- ★Add backpressure at every async boundary (max queue depth, max in-flight).
- ★Cache the answer, not the request — cache by canonical key.
- ★Use idempotency keys for any retried operation.
Further Reading
Frequently Asked Questions
When do I need a CDN?
Whenever users are geographically distributed and you serve any cacheable content — which is almost every web product.
Read replica vs sharding?
Reach for replicas first (they scale reads, easy to add). Shard only when writes saturate a primary.
Kafka vs RabbitMQ?
Kafka for high-throughput event streams + replay. RabbitMQ for traditional task queues with rich routing.
