// system design

System Design Fundamentals

The building blocks every backend engineer needs to design systems that survive contact with real traffic — load balancers, caching, CDNs, database scaling, message queues, API gateways, and service discovery.

Quick Reference

  • Load balancer — L4 (TCP) vs L7 (HTTP); algorithms: RR, least-conn, IP-hash
  • Cache — in-process (Caffeine), distributed (Redis), CDN edges
  • CDN — static + edge-cached dynamic; cache keys matter
  • DB scaling — read replicas, sharding, partitioning, CQRS
  • Message queue — async, durable, decouples producers/consumers
  • API gateway — single entrypoint: auth, routing, rate limiting
  • Service discovery — DNS, Consul, Eureka, Kubernetes Services

Learning Path

Recommended order

  1. 1.Beginner
  2. 2.Intermediate
  3. 3.Advanced

Prerequisites

  • HTTP basics
  • SQL basics
  • What a service is

Skills you will learn

  • Designing scalable systems on a whiteboard
  • Picking the right caching layer
  • Reasoning about consistency vs availability

Estimated time

Multiple weeks of study + practice.

Architecture Overview

Architecture

Microservices Architecture

CLIENTAPI GATEWAYSERVICESDATAEXTERNALRESTpublishsubscribeWeb AppMobile AppAPI GatewayRouting · AuthUsers ServiceOrders ServiceBilling ServiceUsers DBPostgreSQLOrders DBPostgreSQLEvent BusKafkaStripePaymentsEmail APISES / SendGrid
An API gateway routes traffic to independent services. Each service owns its data and communicates via REST or async events.

Load Balancers

Spread traffic across instances.

Recommended

L4 (TCP) load balancers (NLB, HAProxy) route by IP/port; L7 (HTTP) load balancers (ALB, Nginx, Envoy) route by host, path, headers. Health checks remove unhealthy backends.

Pros

  • +Horizontal scale
  • +Zero-downtime deploys
  • +Health-based routing

Cons

  • Sticky sessions complicate state
  • Add a network hop

Best for: Every production system.

Caching

Remember what you just computed.

Layered: browser → CDN → API gateway → in-process (Caffeine) → distributed (Redis) → database. Each layer cuts latency and load on the next.

Pros

  • +Massive latency wins
  • +Reduces DB load

Cons

  • Cache invalidation is hard
  • Stale data risk

Best for: Read-heavy workloads.

Content Delivery Network

Push static (and some dynamic) content to the edge.

Cloudflare, CloudFront, Fastly cache assets near users. Modern CDNs also run edge functions for personalization without hitting origin.

Pros

  • +Lower latency globally
  • +Origin shielding
  • +DDoS absorption

Cons

  • Cache invalidation latency
  • Cost on egress for non-cacheable traffic

Best for: Global products with static assets or cacheable APIs.

Database Scaling

Read replicas, sharding, and partitioning.

Vertical scaling buys time; read replicas absorb read load; sharding partitions data by key; partitioning splits large tables. Pick based on access pattern.

Pros

  • +Linear read scale with replicas
  • +Sharding scales writes

Cons

  • Replication lag
  • Sharding adds cross-shard query complexity

Best for: Any system above ~10k QPS.

Message Queues

Asynchronous, durable communication.

RabbitMQ for traditional queues, Kafka for high-throughput event streams, SQS for managed AWS. Decouples producers from consumers and absorbs traffic spikes.

Pros

  • +Decoupling
  • +Backpressure absorption
  • +Retry semantics

Cons

  • Eventual consistency
  • Ordering guarantees vary

Best for: Background jobs, integration, event-driven flows.

API Gateway

Single entrypoint: auth, routing, rate limiting, observability.

Kong, Spring Cloud Gateway, AWS API Gateway. Centralizes cross-cutting concerns so services stay focused on business logic.

Pros

  • +Centralized auth and rate limiting
  • +Unified observability

Cons

  • Single point of failure if not HA
  • Can become a deployment bottleneck

Best for: Microservice systems with external consumers.

Service Discovery

How services find each other.

DNS, Consul, Eureka, and Kubernetes Services / Endpoints. Required once you stop hardcoding IPs.

Pros

  • +Dynamic scaling
  • +Health-aware routing

Cons

  • Adds a control-plane dependency

Best for: Microservice and Kubernetes environments.

Pick the right tool

ProblemFirst ReachSecond Reach
Latency too high (reads)Add cache (Redis/Caffeine)Add CDN / read replicas
DB writes saturatedPartition / batchShard or use CQRS
Bursty trafficAdd a queueAdd autoscaling
Many cross-service callsAPI gateway + cachingRe-architect boundaries
Global usersCDNMulti-region replicas

Common Mistakes

  • !Caching everything — invalidation becomes harder than the original problem.
  • !Treating the message queue as a database.
  • !Skipping circuit breakers when calling external services.
  • !Sharding before exhausting read replicas + partitioning.

Production Tips

  • Define SLOs first; design backward from latency and error budgets.
  • Add backpressure at every async boundary (max queue depth, max in-flight).
  • Cache the answer, not the request — cache by canonical key.
  • Use idempotency keys for any retried operation.

Further Reading

Frequently Asked Questions

When do I need a CDN?

Whenever users are geographically distributed and you serve any cacheable content — which is almost every web product.

Read replica vs sharding?

Reach for replicas first (they scale reads, easy to add). Shard only when writes saturate a primary.

Kafka vs RabbitMQ?

Kafka for high-throughput event streams + replay. RabbitMQ for traditional task queues with rich routing.