Microservices⏱ 18 min read·By Liyabona Saki·Last updated May 27, 2026

Event-Driven Architecture with Spring Boot and Kafka — Building Reactive Distributed Systems

Design and implement event-driven systems with Spring Boot and Kafka — producers, consumers, schemas, idempotency, dead-letter queues and production-grade patterns.

Introduction

Synchronous REST is the default way services talk, and for many workloads it's the right choice. But the moment three services need to react to one business event — *"OrderPlaced"* triggers billing, inventory, notifications and analytics — REST starts to hurt. Every new consumer means a new client in the producer, retries are everyone's problem, and one slow downstream brings the whole chain down.

Event-driven architecture (EDA) flips the relationship. Services emit events when something happens; anyone interested subscribes. Producers don't know who's listening. New consumers are added without touching the producer. This is the architecture that powers LinkedIn, Uber, and most of modern fintech — and Kafka is the de facto backbone.

Key takeaways

Events are facts about the past (OrderPlaced), not requests (placeOrder).
Kafka is an append-only log, not a queue — consumers track their own position.
Idempotency is the consumer's responsibility. Always.
A dead-letter topic + retries beats wrapping every consumer in try/catch.
Use schemas (Avro/Protobuf/JSON Schema). String-shaped JSON between services is technical debt.

Events vs commands vs queries

Command — *"do this"*. One handler. Can fail. Synchronous-shaped, even when async.
Query — *"tell me"*. Read-only. Synchronous.
Event — *"this happened"*. Many subscribers. Cannot be refused. Past tense.

Naming matters. OrderPlaced, not PlaceOrder. PaymentCaptured, not CapturePayment.

Kafka in 60 seconds

text

producer ──► [partition 0]──────────────► consumer-group-A
                                          consumer-group-B
producer ──► [partition 1]──────────────►
              topic = "orders.events"

A topic is a named log.
A topic is split into partitions for parallelism.
A consumer group reads a topic; Kafka rebalances partitions across the group's members.
Each consumer tracks its offset — the last message it processed.
Messages within a partition are ordered; across partitions they are not.

The single most important design decision is the partition key. orderId keeps all events for one order in order; customerId keeps a customer's events together. Pick whichever invariant matters most.

The reference architecture

text

┌────────────┐      orders.events       ┌────────────┐
│  orders    │ ───────────────────────► │  billing   │
│  service   │ ───┐                    └────────────┘
└────────────┘    │                     ┌────────────┐
                  ├───────────────────► │ inventory  │
                  │                     └────────────┘
                  │                     ┌────────────┐
                  └───────────────────► │ analytics  │
                                        └────────────┘

One producer, three independent consumers. Each consumer is in its own consumer group, so all three see every event.

Step 1 — Docker Compose for local Kafka

yaml

# docker-compose.yml
services:
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    ports: ["9092:9092"]
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      CLUSTER_ID: kraft-cluster
  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports: ["8080:8080"]
    environment:
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092

docker compose up -d and you have a working Kafka in 15 seconds.

Step 2 — The event schema

```java
public sealed interface OrderEvent permits OrderPlaced, OrderCancelled {
  UUID eventId();        // for idempotency
  UUID orderId();        // partition key
  Instant occurredAt();
}

public record OrderPlaced( UUID eventId, UUID orderId, UUID customerId, BigDecimal total, String currency, Instant occurredAt ) implements OrderEvent {} ```

In production, register this schema in Confluent Schema Registry with Avro or Protobuf. JSON is fine to start; schemas become non-negotiable the moment you have more than two services.

Step 3 — The producer

```java
@Configuration
class KafkaProducerConfig {
  @Bean
  KafkaTemplate<String, OrderEvent> orderEventTemplate(KafkaProperties props) {
    var producerProps = props.buildProducerProperties();
    producerProps.put(ProducerConfig.ACKS_CONFIG, "all");                  // wait for all replicas
    producerProps.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);     // no duplicate writes
    producerProps.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);
    return new KafkaTemplate<>(new DefaultKafkaProducerFactory<>(producerProps));
  }
}

@Service @RequiredArgsConstructor class OrderEventPublisher { private final KafkaTemplate<String, OrderEvent> template;

void publish(OrderEvent event) { template.send("orders.events", event.orderId().toString(), event); } } ```

acks=all + enable.idempotence=true is the must-have combination for any event that matters.

Step 4 — The transactional outbox

A bug nobody catches the first time: the order is saved to Postgres, then the publish to Kafka fails. Now the order exists but nobody downstream knows.

The fix is the transactional outbox pattern:

sql

CREATE TABLE outbox_event (
  id           UUID PRIMARY KEY,
  aggregate_id UUID NOT NULL,
  type         VARCHAR(120) NOT NULL,
  payload      JSONB NOT NULL,
  occurred_at  TIMESTAMPTZ NOT NULL,
  published_at TIMESTAMPTZ
);

java

@Transactional
public OrderId place(PlaceOrderCommand cmd) {
  Order order = Order.place(cmd);
  orderRepo.save(order);
  outboxRepo.save(OutboxEntry.from(new OrderPlaced(/*...*/)));   // same transaction
  return order.id();
}

A separate poller (or Debezium reading the WAL) ships outbox rows to Kafka and marks them published. The DB transaction is the source of truth; Kafka eventually catches up.

Step 5 — The consumer

```java
@Component
@RequiredArgsConstructor
class ChargeOnOrderPlaced {
  private final PaymentService payments;
  private final ProcessedEventRepository processed;   // idempotency store

@KafkaListener( topics = "orders.events", groupId = "billing", containerFactory = "orderEventListenerFactory" ) @Transactional public void onEvent(OrderEvent event, Acknowledgment ack) { if (!(event instanceof OrderPlaced placed)) { ack.acknowledge(); return; }

if (processed.exists(placed.eventId())) { // idempotency check ack.acknowledge(); return; }

payments.charge(placed.customerId(), placed.total(), placed.currency(), placed.orderId()); processed.markProcessed(placed.eventId()); ack.acknowledge(); } } ```

Three properties make this safe:

1. Manual ack — only commit the offset after the work is done. 2. Idempotency table — eventId deduplicates inevitable redeliveries. 3. One transaction — payment + idempotency row commit together.

Step 6 — Retries and dead-letter topics

Some failures are transient (downstream timeout). Others are permanent (malformed event). Handle both:

java

@Bean
DefaultErrorHandler errorHandler(KafkaTemplate<Object, Object> dlt) {
  var recoverer = new DeadLetterPublishingRecoverer(dlt,
    (rec, ex) -> new TopicPartition(rec.topic() + ".DLT", rec.partition()));
  return new DefaultErrorHandler(
    recoverer,
    new FixedBackOff(1_000L, 3));                    // 3 retries, 1s apart
}

After 3 retries the event lands in orders.events.DLT. A human (or a scheduled job) can inspect, fix and replay.

Idempotency patterns

Natural keys — if the message is PaymentCaptured for paymentId=42, check before charging.
Idempotency table — store every processed eventId for a retention window (e.g. 7 days).
Idempotent operations — UPSERT instead of INSERT; SET status = 'PAID' WHERE status = 'PENDING'.

Kafka guarantees at-least-once by default. Your consumers must turn that into exactly-once effect.

CQRS, in one paragraph

Once you have events, the Command Query Responsibility Segregation pattern is cheap. Write commands change aggregates and emit events. A separate read-side consumes events and projects them into denormalized read models (Elasticsearch, a wide Postgres table, Redis). Reads become fast and queryable without ever joining the write model. Worth the complexity only when read and write workloads differ sharply.

Event sourcing — when (not) to use it

Event sourcing persists the events themselves as the source of truth and rebuilds state by replaying them. It's powerful for audit-heavy domains (banking, healthcare) and overkill for almost everything else. EDA with Kafka does not require event sourcing.

Distributed tracing

Propagate a traceparent header through every event so you can follow a request across services. Spring Cloud Sleuth / Micrometer Tracing wire this in automatically. Without tracing, debugging an async system is archaeology.

Common mistakes

Treating Kafka as a queue — deleting messages after read, not partitioning, single-consumer thinking.
Sharing the same DTO between producer and consumer via a JAR — now any schema change is a coordinated deploy.
No idempotency — duplicate charges in production within a week.
Letting one slow consumer block the topic — use separate consumer groups, separate apps.
Using auto-commit — the offset commits before your work succeeds. Disable it.

Production best practices

acks=all, enable.idempotence=true, min.insync.replicas=2 on every important topic.
Number of partitions ≥ peak number of consumer instances per group.
Schema registry with backwards-compatible evolution from day one.
Outbox for every producer that also writes to a database.
Consumer lag alerts in Prometheus — lag is the first symptom of trouble.
Tune max.poll.records and max.poll.interval.ms so a slow batch doesn't trigger a rebalance.

Scaling strategies

Scale consumers up to the number of partitions in the topic. Beyond that, more consumers sit idle.
Need more parallelism? Repartition the topic. (Plan key changes carefully — repartitioning is disruptive.)
For very high throughput, separate hot topics (high volume, narrow schema) from wide topics (rich payload, lower volume).

FAQ

Should every service emit events? Every service that owns data others care about, yes.

RabbitMQ or Kafka? RabbitMQ is a smart broker (queues, routing). Kafka is a dumb log (replay, retention). For event-driven architectures with replay and multiple consumers, Kafka wins. For classic task queues, RabbitMQ is simpler.

Can I do EDA without Kafka? Yes — Redis Streams, AWS SNS+SQS, GCP Pub/Sub all work. Kafka is the most operationally proven at scale.

Key takeaways

Understand the core concepts behind Event-Driven Architecture with Spring Boot and Kafka — Building Reactive Distributed Systems in a production context.
Apply the patterns to real Microservices systems, not just toy examples.
Recognize the trade-offs, failure modes, and operational concerns before adopting them.
Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

1. Copy-pasting code without understanding the trade-offs
It's tempting to ship a snippet from a blog post into production, but Microservices patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.
2. Skipping observability from day one
Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Microservices systems without them is painful and expensive.
3. Optimizing too early
Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.
4. Ignoring security defaults
Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting Event-Driven Architecture with Spring Boot and Kafka — Building Reactive Distributed Systems to a real production environment.

Scalability

Design Microservices services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Microservices systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on May 27, 2026. We revisit popular Microservices tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

Code samples are inlined in the tutorial. When a companion repository is published it will be linked at the top of this page.

Go deeper

Event-Driven Architecture with Spring Boot and Kafka — Building Reactive Distributed Systems

Introduction

Key takeaways

Events vs commands vs queries

Kafka in 60 seconds

The reference architecture

Step 1 — Docker Compose for local Kafka

Step 2 — The event schema

Step 3 — The producer

Step 4 — The transactional outbox

Step 5 — The consumer

Step 6 — Retries and dead-letter topics

Idempotency patterns

CQRS, in one paragraph

Event sourcing — when (not) to use it

Distributed tracing

Common mistakes

Production best practices

Scaling strategies

FAQ

Related tutorials