Microservices⏱ 21 min read·By Liyabona Saki·Last updated Jun 3, 2026

Designing Event-Driven Microservices with Kafka and Spring Boot

A complete production guide to event-driven microservices with Kafka and Spring Boot — producers, consumers, topics, partitions, consumer groups, retry strategies, schema evolution and operational best practices.

Introduction

In a synchronous microservices architecture, every interaction is a network call: order calls inventory, inventory calls warehouse, warehouse calls notification. One slow downstream service slows the entire chain; one failure cascades through the system. The fix is to invert the dependency direction with events.

Event-driven microservices communicate by publishing and subscribing to immutable domain events through a broker like Apache Kafka. Producers don't know who consumes; consumers don't know who produces. Each service moves at its own pace, fails independently, and scales independently. This tutorial is a complete production walkthrough of building event-driven microservices with Kafka and Spring Boot — topics, partitions, consumer groups, retries, schema evolution and the operational practices that keep the system healthy.

Why event-driven

Three benefits compound as the system grows:

Decoupling. Adding a new consumer requires zero changes to the producer. A new analytics service can subscribe to orders.events without anyone shipping new code in the order service.
Resilience. If the payment service is down for two minutes, events queue in Kafka. When it recovers, it catches up. Synchronous calls would have failed and lost data.
Scalability. Kafka partitions let each consumer parallelize trivially. A single topic can absorb millions of events per second.

The trade-off: eventual consistency, harder debugging (no stack trace across services), and a broker to operate.

Architecture

Event-Driven Microservices Architecture

Services communicate by publishing and subscribing to domain events through Kafka. No service calls another directly, eliminating tight coupling and synchronous failure cascades.

Real-world use cases

E-commerce checkout — order, inventory, payment, shipping and notification services coordinate via events.
Real-time analytics — every interesting event is logged to Kafka and consumed by streaming jobs.
CDC pipelines — Debezium streams database changes to Kafka, fanning out to warehouses and search indexes.
IoT telemetry — millions of devices write to Kafka; consumers process, aggregate and alert.
User activity tracking — clickstreams flow to Kafka and are projected into recommendation systems.

Kafka fundamentals

Three concepts run everything:

Topic — a named, append-only log. Producers write to it; consumers read from it.
Partition — each topic is split into N partitions. Order is guaranteed *within* a partition, not across.
Consumer group — a set of consumers that share work on a topic. Each partition is consumed by exactly one member of the group at a time.

If you have 6 partitions and 3 consumers in a group, each consumer handles 2 partitions. Add a fourth consumer — Kafka rebalances. Add a seventh — it sits idle (no partition to assign).

Architecture

Kafka Topics, Partitions & Consumer Groups

Producers write to partitioned topics. Consumers in the same group share partitions for parallelism; different groups receive a full copy of the stream, enabling independent processing pipelines.

Step 1 — Set up Spring Boot + Kafka

xml

<dependency>
  <groupId>org.springframework.kafka</groupId>
  <artifactId>spring-kafka</artifactId>
</dependency>

yaml

spring:
  kafka:
    bootstrap-servers: kafka:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
      acks: all
      properties:
        enable.idempotence: true
        max.in.flight.requests.per.connection: 5
    consumer:
      group-id: orders
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
      properties:
        spring.json.trusted.packages: "com.example.events"
      enable-auto-commit: false
      auto-offset-reset: earliest
    listener:
      ack-mode: manual

Two settings matter most: - acks=all + enable.idempotence=true — durable, deduped producer. - enable-auto-commit: false + ack-mode: manual — the consumer only commits the offset after successful processing.

Step 2 — Define events

Events are immutable, versioned and broker-friendly.

```java
public sealed interface OrderEvent permits OrderPlaced, OrderPaid, OrderShipped {}

public record OrderPlaced( UUID eventId, UUID orderId, UUID customerId, BigDecimal total, int schemaVersion ) implements OrderEvent {} ```

Include eventId for consumer-side dedupe and schemaVersion from day one.

Step 3 — Producer

```java
@Service
@RequiredArgsConstructor
public class OrderEventPublisher {
  private final KafkaTemplate<String, OrderEvent> kafka;

public void publish(OrderEvent event) { String key = switch (event) { case OrderPlaced e -> e.orderId().toString(); case OrderPaid e -> e.orderId().toString(); case OrderShipped e -> e.orderId().toString(); }; kafka.send("orders.events", key, event); } } ```

Setting the Kafka key to the aggregate ID ensures all events for one order go to the same partition — and therefore arrive in order.

For at-least-once + atomic write semantics, combine this with the Outbox pattern instead of publishing directly.

Step 4 — Consumer

```java
@Component
@RequiredArgsConstructor
public class InventoryListener {

private final InventoryService inventory; private final ProcessedEventRepository processed;

@KafkaListener( topics = "orders.events", groupId = "inventory", concurrency = "3" ) @Transactional public void on(OrderEvent event, Acknowledgment ack) { if (processed.existsById(eventId(event))) { ack.acknowledge(); return; } if (event instanceof OrderPlaced p) { inventory.reserve(p.orderId(), p.total()); } processed.save(new ProcessedEvent(eventId(event), Instant.now())); ack.acknowledge(); } } ```

concurrency = "3" starts three consumer threads on this instance. With 6 partitions and 2 instances of 3 threads, each thread owns 1 partition.

Architecture

Event-Driven Order Processing Workflow

An order placed by the user fans out through Kafka topics to inventory, payment and notification services. Each step is asynchronous, retryable and decoupled from the others.

Step 5 — Retries and dead-letter topics

A consumer should retry transient failures and quarantine poison messages. Spring Kafka has first-class support.

```java
@Bean
public DefaultErrorHandler errorHandler(KafkaTemplate<String, Object> template) {
  var recoverer = new DeadLetterPublishingRecoverer(template,
      (record, ex) -> new TopicPartition(record.topic() + ".DLT", record.partition()));

var backoff = new ExponentialBackOffWithMaxRetries(5); backoff.setInitialInterval(500); backoff.setMultiplier(2.0); backoff.setMaxInterval(10_000);

var handler = new DefaultErrorHandler(recoverer, backoff); handler.addNotRetryableExceptions(IllegalArgumentException.class); return handler; } ```

Transient errors retry with exponential backoff; permanent errors land in orders.events.DLT for manual inspection. The DLT is a Kafka topic like any other — you can build a simple admin UI that reads it.

Step 6 — Schema evolution

Events live forever in Kafka. Plan for change.

Add fields, never remove. Old consumers ignore unknown fields.
Make new fields nullable or defaulted. Old producers won't set them.
Use a schema registry (Confluent or Apicurio) to enforce compatibility at publish time.
Bump schemaVersion for breaking changes. Consumers can route to old vs new handlers, or run a one-off projection to upgrade older events.

Production best practices

Partition for parallelism. Pick a partition count that supports peak throughput; doubling later is operationally painful.
Set replication factor ≥3. Anything less is not durable enough for production.
Monitor consumer lag. kafka_consumergroup_lag is the single most important metric. Alert when it grows.
Use idempotent producers and consumers. At-least-once means you *will* see duplicates.
Tune max.poll.records and max.poll.interval.ms. A slow handler can be kicked out of the group; size batches to fit the SLO.
Compact topics for state snapshots. Use cleanup.policy=compact on topics that represent the latest value per key (user profiles, configs).
Separate hot and cold topics. Don't mix high-throughput firehoses with low-volume control events.
Run a chaos test. Kill consumers, isolate brokers, watch the system rebalance and recover.

Security considerations

Enable TLS on the broker; never run plaintext in production.
Use SASL (SCRAM or OAuth) for client auth. mTLS is also fine if you have the PKI for it.
Apply Kafka ACLs per topic and consumer group; least privilege from day one.
Encrypt PII in event payloads. Brokers are operators with access — design assuming they can read what they store.
Sign events with a payload hash if cross-org consumers may not be trusted.

Common mistakes

1. No idempotency. Treating Kafka as exactly-once. It is not. 2. Synchronous workflow disguised as events. Service A publishes, waits, service B publishes back. That is RPC with extra steps and worse latency. 3. Too few partitions. Throughput is capped at one consumer per partition — you cannot scale past that. 4. Wrong key. Random keys break per-aggregate ordering; constant keys overload one partition. 5. No DLT. A poison message blocks the partition for everyone behind it. 6. Schema evolution without a registry. A breaking change quietly takes down every consumer. 7. Long-running handlers. A handler that takes 60 seconds will exceed max.poll.interval.ms and trigger constant rebalances.

Troubleshooting guide

Consumer lag growing. Add consumer instances up to the partition count; profile the handler; check downstream dependencies.
Frequent rebalances. Reduce handler work per poll, increase max.poll.interval.ms, or move blocking I/O off the consumer thread.
Out-of-order events. Verify producer key matches the entity ID and there is no manual partition assignment.
DLT growing. Inspect a few messages, fix the bug, replay from the DLT back to the main topic.
Mysterious duplicates. Confirm enable.idempotence=true on the producer and consumer-side dedupe.
Broker disk filling. Tune retention (retention.ms, retention.bytes); enable compaction where appropriate.

FAQ

1. Should every microservice be event-driven? No. Use sync calls for low-latency, single-result reads. Use events for state changes, fan-out and decoupling.

2. Kafka vs RabbitMQ? Kafka for high-throughput durable streams and replay; RabbitMQ for flexible routing and RPC-style messaging. Pick by use case, not vibes.

3. How do I trigger immediate downstream action? Publish the event and have the downstream consumer act on it. Real-time means consumer lag in milliseconds, not synchronous calls.

4. How do I run sagas? Use the choreography pattern (each step listens for the previous event) or orchestration (a saga coordinator publishes and listens). Both work over Kafka.

5. What about exactly-once? Kafka transactions provide exactly-once within Kafka (read-process-write). Across heterogeneous systems, idempotent at-least-once is the realistic target.

6. Where do I store consumer state? In the consumer's own database. Don't use Kafka as a database.

7. How many partitions do I need? Estimate peak throughput per partition (rule of thumb: 10–25 MB/s per partition for typical hardware) and add headroom for future consumers.

8. How do I version event schemas? Backward-compatible additions; major bumps for breaking changes; enforce via a schema registry. See the schema evolution section.

9. How do I handle PII deletion (GDPR)? Either tombstone the key in a compacted topic, or encrypt PII per-user with a key you can later revoke (crypto-shredding).

10. Can I use Kafka without microservices? Absolutely. Modular monoliths benefit from events too — they make boundaries explicit even within one deployable.

Key takeaways

Event-driven microservices decouple services in time and dependency, trading instant consistency for resilience and scale.
Partition by aggregate key, run consumers in groups, and treat at-least-once delivery as a first-class requirement.
Always pair commands with the Outbox pattern to avoid dual-write bugs.
Plan for schema evolution from day one; use a registry to enforce it.
Monitor consumer lag, configure DLTs, and run chaos tests — the broker is the central nervous system of the platform.

Key takeaways

Understand the core concepts behind Designing Event-Driven Microservices with Kafka and Spring Boot in a production context.
Apply the patterns to real Microservices systems, not just toy examples.
Recognize the trade-offs, failure modes, and operational concerns before adopting them.
Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Microservices patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Microservices systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Designing Event-Driven Microservices with Kafka and Spring Boot to a real production environment.

Scalability

Design Microservices services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Microservices systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular Microservices tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

The full source code is available on GitHub: https://github.com/masterlabsystems/event-driven-kafka-demo. Fork it, run it locally, and adapt it to your own project.

Go deeper

Designing Event-Driven Microservices with Kafka and Spring Boot

Introduction

Why event-driven

Event-Driven Microservices Architecture

Real-world use cases

Kafka fundamentals

Kafka Topics, Partitions & Consumer Groups

Step 1 — Set up Spring Boot + Kafka

Step 2 — Define events

Step 3 — Producer

Step 4 — Consumer

Event-Driven Order Processing Workflow

Step 5 — Retries and dead-letter topics

Step 6 — Schema evolution

Production best practices

Security considerations

Common mistakes

Troubleshooting guide

FAQ

Key takeaways

Related tutorials

Event-Driven Architecture

Key takeaways

Common mistakes

Production best practices

Frequently asked questions

Further reading

Follow the full tutorial series on YouTube

Get the next tutorial in your inbox

Related tutorials

Spring Boot Microservices Architecture Explained Step by Step

How to Build a Spring Cloud Config Server

Service Discovery with Eureka in Spring Boot