Redis Distributed Caching Architecture for High-Traffic APIs
Build a production-grade distributed cache with Redis and Spring Boot — cache-aside, @Cacheable, TTL, eviction, cache stampedes, warming, hit-rate monitoring and scalability for high-traffic APIs.
Introduction
Every high-traffic API hits the same wall: the database. A relational database is the source of truth, but it is the most expensive thing in your stack to scale. Before you shard PostgreSQL or buy a bigger RDS instance, you almost always reach for a distributed cache — and Redis is the default choice.
A well-placed Redis cache can drop database load by 90% and cut p99 latency from 200ms to 5ms. A badly placed one causes stampedes, stale data and outages. This tutorial is a complete production walkthrough of Redis caching with Spring Boot: cache-aside, @Cacheable, eviction, TTLs, warming, monitoring hit rates, and the failure modes that bite real systems.
Why distributed caching
A local in-process cache (Caffeine, Guava) is fast and free, but it doesn't scale across instances. Ten replicas of your service mean ten copies of the cache, ten cold starts after every deploy, and ten different versions of any updated value. A distributed cache solves all three problems at the cost of a network hop.
Redis specifically wins because:
- Sub-millisecond latency. P99 GETs land around 0.5ms over a LAN.
- Rich data structures. Strings, hashes, sets, sorted sets, streams — far beyond a key-value store.
- Battle-tested. Powers Twitter, GitHub, Stack Overflow and most of the modern web.
- Cluster mode. Horizontally scalable when one node isn't enough.
Architecture
Without Cache — Every Read Hits the Database
Real-world use cases
- Product catalog reads — every page view hits the cache, the DB sees only writes and misses.
- User session stores — sticky-session-free horizontal scaling.
- Rate limiters —
INCR+ TTL implements a sliding window in two lines. - Pre-computed homepage feeds — refresh in the background, serve from cache.
- API response caching — cache by request fingerprint at the gateway.
- Distributed locks —
SET key value NX PX 30000for one-leader-at-a-time jobs.
The architecture
Architecture
With Redis Cache — Read-Through with TTL
The application checks Redis first. Hits return immediately; misses fall through to PostgreSQL, populate Redis with a TTL, then return. Subsequent reads are served from the cache until the TTL expires or the key is invalidated.
Step 1 — Set up Redis with Spring Boot
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-cache</artifactId>
</dependency>
spring:
data:
redis:
host: redis
port: 6379
timeout: 200ms
lettuce:
pool:
max-active: 16
max-idle: 8
cache:
type: redis
redis:
time-to-live: 600000 # 10 minutes default
cache-null-values: false
```java
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public RedisCacheManager cacheManager(RedisConnectionFactory cf) {
var defaults = RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(10))
.serializeValuesWith(RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()))
.disableCachingNullValues();return RedisCacheManager.builder(cf) .cacheDefaults(defaults) .withCacheConfiguration("products", defaults.entryTtl(Duration.ofMinutes(30))) .withCacheConfiguration("hot-search", defaults.entryTtl(Duration.ofSeconds(30))) .build(); } } ```
Different caches with different TTLs — that is the whole point of named cache regions.
Step 2 — `@Cacheable` and friends
```java
@Service
@RequiredArgsConstructor
public class ProductService {private final ProductRepository repo;
@Cacheable(value = "products", key = "#id") public ProductDto getById(UUID id) { return repo.findById(id) .map(ProductDto::from) .orElseThrow(() -> new NotFoundException("product " + id)); }
@CachePut(value = "products", key = "#dto.id") public ProductDto update(ProductDto dto) { var saved = repo.save(Product.from(dto)); return ProductDto.from(saved); }
@CacheEvict(value = "products", key = "#id") public void delete(UUID id) { repo.deleteById(id); } } ```
@Cacheable— read-through. Check cache, fall back to method, store result.@CachePut— always run the method, store the result. Use on writes.@CacheEvict— drop the key. Use on deletes.
Step 3 — Cache-aside (manual control)
@Cacheable is great until you need fine-grained control. For that, use the StringRedisTemplate directly.
```java
@Service
@RequiredArgsConstructor
public class FeedService {
private final StringRedisTemplate redis;
private final FeedRepository repo;
private final ObjectMapper json;public FeedDto feedFor(UUID userId) throws Exception { String key = "feed:" + userId; String cached = redis.opsForValue().get(key); if (cached != null) return json.readValue(cached, FeedDto.class);
FeedDto fresh = computeFeed(userId); redis.opsForValue().set(key, json.writeValueAsString(fresh), Duration.ofMinutes(5)); return fresh; } } ```
Architecture
Cache-Aside Pattern
Step 4 — Defeating the cache stampede
When a hot key expires, every concurrent request misses at once and slams the database. This is the thundering herd. The fix is to add a short-lived lock around the recompute:
```java
public FeedDto feedFor(UUID userId) {
String key = "feed:" + userId;
String cached = redis.opsForValue().get(key);
if (cached != null) return read(cached);String lock = "lock:" + key; Boolean acquired = redis.opsForValue() .setIfAbsent(lock, "1", Duration.ofSeconds(5));
if (Boolean.TRUE.equals(acquired)) { try { FeedDto fresh = computeFeed(userId); redis.opsForValue().set(key, write(fresh), Duration.ofMinutes(5)); return fresh; } finally { redis.delete(lock); } } else { // Wait briefly, then read the freshly-populated cache Thread.sleep(50); String filled = redis.opsForValue().get(key); return filled != null ? read(filled) : computeFeed(userId); } } ```
For very hot keys, combine this with probabilistic early expiration — refresh in the background when within 10% of TTL — so the user never waits.
Step 5 — Cache warming
For predictable cold starts (deploys, restarts, scaling events), pre-load the top-N hot keys.
```java
@Component
@RequiredArgsConstructor
public class CacheWarmer {
private final ProductService products;
private final ProductRepository repo;@EventListener(ApplicationReadyEvent.class) public void warm() { repo.findTop100ByOrderByViewsDesc() .forEach(p -> products.getById(p.getId())); } } ```
This costs one big DB query at boot in exchange for not punishing the database when a fresh instance comes online.
Step 6 — Eviction and TTL strategy
- Always set a TTL. A key without a TTL is a memory leak waiting to happen.
- Tune per region. Static reference data: hours. Product details: minutes. Live feeds: seconds.
- Pick a maxmemory policy.
allkeys-lrufor general-purpose caches;volatile-lruif you mix caching and durable data in one Redis. - Use
SCAN, notKEYS, for bulk inspection.KEYSblocks the entire server.
maxmemory 4gb
maxmemory-policy allkeys-lru
Step 7 — Monitoring hit rates
The single most important cache metric is hit ratio. Anything under 80% on a hot endpoint means the cache isn't earning its keep.
@Bean
public RedisCacheManagerBuilderCustomizer micrometer(MeterRegistry meters) {
return builder -> builder.cacheDefaults(
builder.build().getDefaultCacheConfiguration());
}
Spring Boot's actuator exposes cache.gets with a result tag of hit or miss. In Prometheus:
rate(cache_gets_total{result="hit"}[5m])
/
rate(cache_gets_total[5m])
Alert when hit ratio drops below your SLO (e.g. 90% for product reads). A sudden drop usually means TTLs are too short, the working set outgrew Redis memory, or invalidation is too aggressive.
Beyond hit ratio, watch:
redis_used_memory_bytesvsredis_maxmemory_bytesredis_commands_processed_total— overall throughputredis_evicted_keys_total— rising means memory pressureredis_keyspace_misses_total- p99 latency from the app side (a slow Redis hides behind fast averages)
Performance optimization
- Use Lettuce in non-blocking mode with reactive Spring or async clients.
- Pipeline batched operations. One round trip for 100 GETs instead of 100.
- Compress big values. GZIP payloads >1 KB; the CPU cost is less than the network cost.
- Use hashes for grouped fields.
HSET user:1 name "Ada" age 36is more memory-efficient than two strings. - Co-locate Redis with the app. Same AZ, same VPC. Cross-region cache reads defeat the point.
- Connection pool, don't open per request. Reuse Lettuce connections.
Scalability considerations
- Vertical first. A single Redis on a modest VM does ~100k ops/sec. Most apps never need more.
- Read replicas for read-heavy workloads.
- Redis Cluster when you outgrow one node. Hash tags (
{user:123}) keep related keys on the same shard. - Active-passive replication for HA; Sentinel for automatic failover.
- Per-tenant prefixes in multi-tenant systems (
tenant:42:product:abc) so you can scope invalidations and migrate tenants between clusters.
Security considerations
- Require AUTH. A Redis without a password and exposed to the network is a famous breach vector.
- TLS in transit. Especially across VPC peerings or cloud boundaries.
- Firewall it. Redis should never be on a public IP. Use a private subnet + security group.
- Disable dangerous commands (
FLUSHALL,CONFIG,DEBUG) in production viarename-command. - Don't cache secrets. Tokens and credentials belong in a vault, not in Redis with a TTL.
- Encrypt PII before caching. Redis dumps are easy to leak.
Common mistakes
1. No TTL. Cache fills, evictions thrash, hit rate collapses.
2. Caching the wrong granularity. Caching individual rows when the page joins five tables — cache the joined view instead.
3. @Cacheable on a method that mutates state. It will skip the body on a hit and silently break.
4. Cache stampede on hot keys. Always lock or use probabilistic early refresh on the hot 1%.
5. Inconsistent invalidation. Update via one path, evict via another. Centralize the cache key formula.
6. Caching nulls without thinking. Either store a sentinel with a short TTL or set cache-null-values: false.
7. Caching huge values. A 5MB blob crowds out thousands of small entries; consider object storage instead.
8. Treating Redis as a database. It is a cache. Persistence helps, but the source of truth must be elsewhere.
Troubleshooting guide
- Low hit rate. TTLs too short, working set > memory, key cardinality too high, or wrong granularity. Compare
redis_keyspace_hits_totalvsmisses_total. - High latency. Slow commands (
SLOWLOG GET 100), big values, network saturation, or a single hot key serializing on one shard. - OOM / mass evictions. Memory ceiling too low, no TTL on big keys, or LFU policy with the wrong access pattern.
- Stale reads after writes. Eviction isn't firing on the write path; verify
@CacheEvictor the manualDEL. - Connection storms after a deploy. Add a warm-up grace period; ramp HPA gradually.
- "MOVED" errors in Cluster mode. The client isn't cluster-aware. Use Lettuce with cluster support enabled.
- Latency spike at midnight. Probably AOF rewrite or RDB snapshot; tune
saveandauto-aof-rewrite-percentage.
FAQ
1. Redis vs Memcached? Pick Redis. Richer data types, persistence, replication, pub/sub. Memcached is faster for the absolute simplest key-value workload but rarely worth the trade.
2. Should I cache in the application or at the gateway? Both. Gateway caching wins for anonymous public reads; in-app caching wins for personalized data.
3. How long should TTLs be? As long as the data is acceptably stale. Start with minutes for user data, hours for reference data, then tune from hit rate.
4. What about cache invalidation events? Combine TTLs with explicit eviction on writes. Pure event-driven invalidation is a known hard problem; TTLs make it forgiving.
5. Can I use Redis as my primary database? For some workloads (sessions, rate limits, real-time scores) — yes. For OLTP — no. Use Postgres for source-of-truth state.
6. Is @Cacheable enough?
For 80% of use cases, yes. Drop to StringRedisTemplate when you need control over serialization, eviction logic or stampede protection.
7. How do I cache by request, not by entity? Use Spring Cloud Gateway or a CDN with a fingerprint of method+path+query+auth as the key.
8. What about caching writes?
Use write-through (@CachePut) only when reads immediately follow writes. Otherwise evict and let the next read repopulate.
9. How does Redis interact with CQRS? Beautifully — Redis is the perfect read store for a CQRS projection. See our CQRS tutorial.
10. When is the cache the bottleneck? When Redis CPU is pegged, you have a hot key on one shard, or your serializer is slow. Profile first; sharding by hash tag usually fixes it.
Key takeaways
- Redis caching can cut database load by an order of magnitude when applied to the right hot paths.
- Always set TTLs, monitor hit rate, and design eviction explicitly — never by accident.
- Defeat cache stampedes with locks or probabilistic early refresh.
- Warm hot keys at startup, compress big values, and keep Redis in the same AZ as the app.
- Treat Redis as a fast, ephemeral layer in front of the source of truth — not as the database itself.
Related tutorials
Architecture
Read-Through Cache with Redis
TL;DR
Key takeaways
- Understand the core concepts behind Redis Distributed Caching Architecture for High-Traffic APIs in a production context.
- Apply the patterns to real Java & Spring Boot systems, not just toy examples.
- Recognize the trade-offs, failure modes, and operational concerns before adopting them.
- Get a clear path to the next step — related tutorials, tools, and reference architectures.
Avoid these
Common mistakes
1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Java & Spring Boot patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Java & Spring Boot systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.
Ship it safely
Production best practices
Apply these before promoting Redis Distributed Caching Architecture for High-Traffic APIs to a real production environment.
Scalability
Design Java & Spring Boot services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.
Monitoring & Observability
Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.
Logging
Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.
Security
Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.
Testing
Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Java & Spring Boot systems, also run chaos and load tests before a major release.
Reliability & Rollouts
Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.
Questions
Frequently asked questions
Is this tutorial up to date?
Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular Java & Spring Boot tutorials regularly to keep them aligned with current best practices.
What level is this tutorial aimed at?
It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.
Do I need to follow every step in order?
The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.
Where can I find the source code?
The full source code is available on GitHub: https://github.com/masterlabsystems/redis-caching-spring-boot. Fork it, run it locally, and adapt it to your own project.
Go deeper
Further reading
Source Code
Get the full project on GitHub
More From the Channel
Follow the full tutorial series on YouTube
The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.
Stay in the Loop
Get the next tutorial in your inbox
Related tutorials
API Rate Limiting in Spring Boot with Bucket4j and Redis
Protect your APIs from abuse with per-user and per-IP rate limiting using Bucket4j, Redis and a clean filter-based implementation.
Building REST APIs with Spring Boot: A Complete Guide
Design and build a production-ready REST API with Spring Boot — proper layering, DTOs, validation, error handling and testing.
Spring Boot + Kafka — Build a Real-Time Messaging System
Produce and consume Kafka messages from Spring Boot with proper serialization, error handling and consumer groups.
