Java & Spring Boot⏱ 20 min read·By Liyabona Saki·Last updated Jun 3, 2026

Redis Distributed Caching Architecture for High-Traffic APIs

Build a production-grade distributed cache with Redis and Spring Boot — cache-aside, @Cacheable, TTL, eviction, cache stampedes, warming, hit-rate monitoring and scalability for high-traffic APIs.

Introduction

Every high-traffic API hits the same wall: the database. A relational database is the source of truth, but it is the most expensive thing in your stack to scale. Before you shard PostgreSQL or buy a bigger RDS instance, you almost always reach for a distributed cache — and Redis is the default choice.

A well-placed Redis cache can drop database load by 90% and cut p99 latency from 200ms to 5ms. A badly placed one causes stampedes, stale data and outages. This tutorial is a complete production walkthrough of Redis caching with Spring Boot: cache-aside, @Cacheable, eviction, TTLs, warming, monitoring hit rates, and the failure modes that bite real systems.

Why distributed caching

A local in-process cache (Caffeine, Guava) is fast and free, but it doesn't scale across instances. Ten replicas of your service mean ten copies of the cache, ten cold starts after every deploy, and ten different versions of any updated value. A distributed cache solves all three problems at the cost of a network hop.

Redis specifically wins because:

Sub-millisecond latency. P99 GETs land around 0.5ms over a LAN.
Rich data structures. Strings, hashes, sets, sorted sets, streams — far beyond a key-value store.
Battle-tested. Powers Twitter, GitHub, Stack Overflow and most of the modern web.
Cluster mode. Horizontally scalable when one node isn't enough.

Architecture

Without Cache — Every Read Hits the Database

Each request triggers a SQL query. At high RPS this saturates the database, increases tail latency and limits horizontal scalability.

Real-world use cases

Product catalog reads — every page view hits the cache, the DB sees only writes and misses.
User session stores — sticky-session-free horizontal scaling.
Rate limiters — INCR + TTL implements a sliding window in two lines.
Pre-computed homepage feeds — refresh in the background, serve from cache.
API response caching — cache by request fingerprint at the gateway.
Distributed locks — SET key value NX PX 30000 for one-leader-at-a-time jobs.

The architecture

Architecture

With Redis Cache — Read-Through with TTL

The application looks up Redis first. On a hit it returns in microseconds; on a miss it loads from PostgreSQL and populates Redis with a TTL. Database load drops dramatically.

The application checks Redis first. Hits return immediately; misses fall through to PostgreSQL, populate Redis with a TTL, then return. Subsequent reads are served from the cache until the TTL expires or the key is invalidated.

Step 1 — Set up Redis with Spring Boot

xml

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-cache</artifactId>
</dependency>

yaml

spring:
  data:
    redis:
      host: redis
      port: 6379
      timeout: 200ms
      lettuce:
        pool:
          max-active: 16
          max-idle: 8
  cache:
    type: redis
    redis:
      time-to-live: 600000   # 10 minutes default
      cache-null-values: false

```java
@Configuration
@EnableCaching
public class CacheConfig {
  @Bean
  public RedisCacheManager cacheManager(RedisConnectionFactory cf) {
    var defaults = RedisCacheConfiguration.defaultCacheConfig()
        .entryTtl(Duration.ofMinutes(10))
        .serializeValuesWith(RedisSerializationContext.SerializationPair
            .fromSerializer(new GenericJackson2JsonRedisSerializer()))
        .disableCachingNullValues();

return RedisCacheManager.builder(cf) .cacheDefaults(defaults) .withCacheConfiguration("products", defaults.entryTtl(Duration.ofMinutes(30))) .withCacheConfiguration("hot-search", defaults.entryTtl(Duration.ofSeconds(30))) .build(); } } ```

Different caches with different TTLs — that is the whole point of named cache regions.

Step 2 — `@Cacheable` and friends

```java
@Service
@RequiredArgsConstructor
public class ProductService {

private final ProductRepository repo;

@Cacheable(value = "products", key = "#id") public ProductDto getById(UUID id) { return repo.findById(id) .map(ProductDto::from) .orElseThrow(() -> new NotFoundException("product " + id)); }

@CachePut(value = "products", key = "#dto.id") public ProductDto update(ProductDto dto) { var saved = repo.save(Product.from(dto)); return ProductDto.from(saved); }

@CacheEvict(value = "products", key = "#id") public void delete(UUID id) { repo.deleteById(id); } } ```

@Cacheable — read-through. Check cache, fall back to method, store result.
@CachePut — always run the method, store the result. Use on writes.
@CacheEvict — drop the key. Use on deletes.

Step 3 — Cache-aside (manual control)

@Cacheable is great until you need fine-grained control. For that, use the StringRedisTemplate directly.

```java
@Service
@RequiredArgsConstructor
public class FeedService {
  private final StringRedisTemplate redis;
  private final FeedRepository repo;
  private final ObjectMapper json;

public FeedDto feedFor(UUID userId) throws Exception { String key = "feed:" + userId; String cached = redis.opsForValue().get(key); if (cached != null) return json.readValue(cached, FeedDto.class);

FeedDto fresh = computeFeed(userId); redis.opsForValue().set(key, json.writeValueAsString(fresh), Duration.ofMinutes(5)); return fresh; } } ```

Architecture

Cache-Aside Pattern

The application owns the cache. Reads check Redis then fall back to the database; writes update the database and invalidate the cache key so the next read repopulates it.

Step 4 — Defeating the cache stampede

When a hot key expires, every concurrent request misses at once and slams the database. This is the thundering herd. The fix is to add a short-lived lock around the recompute:

```java
public FeedDto feedFor(UUID userId) {
  String key = "feed:" + userId;
  String cached = redis.opsForValue().get(key);
  if (cached != null) return read(cached);

String lock = "lock:" + key; Boolean acquired = redis.opsForValue() .setIfAbsent(lock, "1", Duration.ofSeconds(5));

if (Boolean.TRUE.equals(acquired)) { try { FeedDto fresh = computeFeed(userId); redis.opsForValue().set(key, write(fresh), Duration.ofMinutes(5)); return fresh; } finally { redis.delete(lock); } } else { // Wait briefly, then read the freshly-populated cache Thread.sleep(50); String filled = redis.opsForValue().get(key); return filled != null ? read(filled) : computeFeed(userId); } } ```

For very hot keys, combine this with probabilistic early expiration — refresh in the background when within 10% of TTL — so the user never waits.

Step 5 — Cache warming

For predictable cold starts (deploys, restarts, scaling events), pre-load the top-N hot keys.

```java
@Component
@RequiredArgsConstructor
public class CacheWarmer {
  private final ProductService products;
  private final ProductRepository repo;

@EventListener(ApplicationReadyEvent.class) public void warm() { repo.findTop100ByOrderByViewsDesc() .forEach(p -> products.getById(p.getId())); } } ```

This costs one big DB query at boot in exchange for not punishing the database when a fresh instance comes online.

Step 6 — Eviction and TTL strategy

Always set a TTL. A key without a TTL is a memory leak waiting to happen.
Tune per region. Static reference data: hours. Product details: minutes. Live feeds: seconds.
Pick a maxmemory policy. allkeys-lru for general-purpose caches; volatile-lru if you mix caching and durable data in one Redis.
Use SCAN, not KEYS, for bulk inspection. KEYS blocks the entire server.

text

maxmemory 4gb
maxmemory-policy allkeys-lru

Step 7 — Monitoring hit rates

The single most important cache metric is hit ratio. Anything under 80% on a hot endpoint means the cache isn't earning its keep.

java

@Bean
public RedisCacheManagerBuilderCustomizer micrometer(MeterRegistry meters) {
  return builder -> builder.cacheDefaults(
      builder.build().getDefaultCacheConfiguration());
}

Spring Boot's actuator exposes cache.gets with a result tag of hit or miss. In Prometheus:

text

rate(cache_gets_total{result="hit"}[5m])
/
rate(cache_gets_total[5m])

Alert when hit ratio drops below your SLO (e.g. 90% for product reads). A sudden drop usually means TTLs are too short, the working set outgrew Redis memory, or invalidation is too aggressive.

Beyond hit ratio, watch:

redis_used_memory_bytes vs redis_maxmemory_bytes
redis_commands_processed_total — overall throughput
redis_evicted_keys_total — rising means memory pressure
redis_keyspace_misses_total
p99 latency from the app side (a slow Redis hides behind fast averages)

Performance optimization

Use Lettuce in non-blocking mode with reactive Spring or async clients.
Pipeline batched operations. One round trip for 100 GETs instead of 100.
Compress big values. GZIP payloads >1 KB; the CPU cost is less than the network cost.
Use hashes for grouped fields. HSET user:1 name "Ada" age 36 is more memory-efficient than two strings.
Co-locate Redis with the app. Same AZ, same VPC. Cross-region cache reads defeat the point.
Connection pool, don't open per request. Reuse Lettuce connections.

Scalability considerations

Vertical first. A single Redis on a modest VM does ~100k ops/sec. Most apps never need more.
Read replicas for read-heavy workloads.
Redis Cluster when you outgrow one node. Hash tags ({user:123}) keep related keys on the same shard.
Active-passive replication for HA; Sentinel for automatic failover.
Per-tenant prefixes in multi-tenant systems (tenant:42:product:abc) so you can scope invalidations and migrate tenants between clusters.

Security considerations

Require AUTH. A Redis without a password and exposed to the network is a famous breach vector.
TLS in transit. Especially across VPC peerings or cloud boundaries.
Firewall it. Redis should never be on a public IP. Use a private subnet + security group.
Disable dangerous commands (FLUSHALL, CONFIG, DEBUG) in production via rename-command.
Don't cache secrets. Tokens and credentials belong in a vault, not in Redis with a TTL.
Encrypt PII before caching. Redis dumps are easy to leak.

Common mistakes

1. No TTL. Cache fills, evictions thrash, hit rate collapses. 2. Caching the wrong granularity. Caching individual rows when the page joins five tables — cache the joined view instead. 3. @Cacheable on a method that mutates state. It will skip the body on a hit and silently break. 4. Cache stampede on hot keys. Always lock or use probabilistic early refresh on the hot 1%. 5. Inconsistent invalidation. Update via one path, evict via another. Centralize the cache key formula. 6. Caching nulls without thinking. Either store a sentinel with a short TTL or set cache-null-values: false. 7. Caching huge values. A 5MB blob crowds out thousands of small entries; consider object storage instead. 8. Treating Redis as a database. It is a cache. Persistence helps, but the source of truth must be elsewhere.

Troubleshooting guide

Low hit rate. TTLs too short, working set > memory, key cardinality too high, or wrong granularity. Compare redis_keyspace_hits_total vs misses_total.
High latency. Slow commands (SLOWLOG GET 100), big values, network saturation, or a single hot key serializing on one shard.
OOM / mass evictions. Memory ceiling too low, no TTL on big keys, or LFU policy with the wrong access pattern.
Stale reads after writes. Eviction isn't firing on the write path; verify @CacheEvict or the manual DEL.
Connection storms after a deploy. Add a warm-up grace period; ramp HPA gradually.
"MOVED" errors in Cluster mode. The client isn't cluster-aware. Use Lettuce with cluster support enabled.
Latency spike at midnight. Probably AOF rewrite or RDB snapshot; tune save and auto-aof-rewrite-percentage.

FAQ

1. Redis vs Memcached? Pick Redis. Richer data types, persistence, replication, pub/sub. Memcached is faster for the absolute simplest key-value workload but rarely worth the trade.

2. Should I cache in the application or at the gateway? Both. Gateway caching wins for anonymous public reads; in-app caching wins for personalized data.

3. How long should TTLs be? As long as the data is acceptably stale. Start with minutes for user data, hours for reference data, then tune from hit rate.

4. What about cache invalidation events? Combine TTLs with explicit eviction on writes. Pure event-driven invalidation is a known hard problem; TTLs make it forgiving.

5. Can I use Redis as my primary database? For some workloads (sessions, rate limits, real-time scores) — yes. For OLTP — no. Use Postgres for source-of-truth state.

6. Is @Cacheable enough? For 80% of use cases, yes. Drop to StringRedisTemplate when you need control over serialization, eviction logic or stampede protection.

7. How do I cache by request, not by entity? Use Spring Cloud Gateway or a CDN with a fingerprint of method+path+query+auth as the key.

8. What about caching writes? Use write-through (@CachePut) only when reads immediately follow writes. Otherwise evict and let the next read repopulate.

9. How does Redis interact with CQRS? Beautifully — Redis is the perfect read store for a CQRS projection. See our CQRS tutorial.

10. When is the cache the bottleneck? When Redis CPU is pegged, you have a hot key on one shard, or your serializer is slow. Profile first; sharding by hash tag usually fixes it.

Key takeaways

Redis caching can cut database load by an order of magnitude when applied to the right hot paths.
Always set TTLs, monitor hit rate, and design eviction explicitly — never by accident.
Defeat cache stampedes with locks or probabilistic early refresh.
Warm hot keys at startup, compress big values, and keep Redis in the same AZ as the app.
Treat Redis as a fast, ephemeral layer in front of the source of truth — not as the database itself.

Key takeaways

Understand the core concepts behind Redis Distributed Caching Architecture for High-Traffic APIs in a production context.
Apply the patterns to real Java & Spring Boot systems, not just toy examples.
Recognize the trade-offs, failure modes, and operational concerns before adopting them.
Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Java & Spring Boot patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Java & Spring Boot systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Redis Distributed Caching Architecture for High-Traffic APIs to a real production environment.

Scalability

Design Java & Spring Boot services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Java & Spring Boot systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular Java & Spring Boot tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

The full source code is available on GitHub: https://github.com/masterlabsystems/redis-caching-spring-boot. Fork it, run it locally, and adapt it to your own project.

Go deeper

Redis Distributed Caching Architecture for High-Traffic APIs

Introduction

Why distributed caching

Without Cache — Every Read Hits the Database

Real-world use cases

The architecture

With Redis Cache — Read-Through with TTL

Step 1 — Set up Redis with Spring Boot

Step 2 — `@Cacheable` and friends

Step 3 — Cache-aside (manual control)

Cache-Aside Pattern

Step 4 — Defeating the cache stampede

Step 5 — Cache warming

Step 6 — Eviction and TTL strategy

Step 7 — Monitoring hit rates

Performance optimization

Scalability considerations

Security considerations

Common mistakes

Troubleshooting guide

FAQ

Key takeaways

Related tutorials

Read-Through Cache with Redis

Key takeaways

Common mistakes

Production best practices

Frequently asked questions

Further reading

Follow the full tutorial series on YouTube

Get the next tutorial in your inbox

Related tutorials

API Rate Limiting in Spring Boot with Bucket4j and Redis

Building REST APIs with Spring Boot: A Complete Guide

Spring Boot + Kafka — Build a Real-Time Messaging System