Software Design & Architecture19 min read·By Liyabona Saki·

The Outbox Pattern — Reliable Event Publishing in Microservices

Solve the dual-write problem with the transactional Outbox pattern. Production guide using Spring Boot, JPA, PostgreSQL, Kafka and Debezium with idempotent consumers and operational best practices.

Advertisement

Introduction

In a microservice that owns both a database and a Kafka topic, every interesting operation looks the same: change a row, publish an event. The obvious implementation — save() then kafkaTemplate.send() — is wrong. It is a textbook dual-write, and dual writes always eventually fail in production. The row is committed, the broker is unreachable, and downstream services never learn that anything happened.

The Outbox pattern is the standard fix. It moves the event into the database as a regular row, written in the same local transaction as the business change. A separate relay reads from the outbox table and republishes the events to Kafka. There is exactly one source of truth (the database) and at-least-once delivery to consumers.

This tutorial walks through a production implementation in Spring Boot with PostgreSQL and Kafka, covering both the polling-relay and Debezium-CDC variants, idempotent consumers, retention, and operational concerns.

The dual-write problem

Consider this naive code:

java
@Transactional
public void place(Order order) {
  orderRepo.save(order);
  kafka.send("orders.events", new OrderPlaced(order.getId()));
}

What happens if the application crashes between save and send? The order exists in the database but no downstream service knows. What about the reverse — send succeeds, save fails? Now consumers act on an order that does not exist.

You cannot fix this with retries inside the method. There is no atomic transaction that spans PostgreSQL and Kafka. Any solution must persist the *intent to publish* in a place that *is* transactional with the business write.

Architecture

The Dual-Write Problem

SERVICEDATABASEBROKER1. commit row2. publish (may fail)Order ServiceTwo separate writesPostgreSQLINSERT orderKafkapublish(OrderCreated)Failure WindowCrash here → drift
Writing to the database and publishing to Kafka in two separate steps is not atomic. A crash between them leaves the system inconsistent — either the row exists without an event, or the event fires without the row.

The Outbox solution

Write the event to an outbox table in the same database transaction as the business change. A separate process (the relay) reads new rows and publishes them to Kafka. Once Kafka has acknowledged, mark the row as sent (or delete it).

Architecture

Transactional Outbox Pattern

SERVICETRANSACTIONRELAYBROKERCONSUMERSINSERT orderINSERT event (same tx)poll / CDCpublishOrder Service@TransactionalOrders TableBusiness stateOutbox TablePending eventsOutbox RelayPoller / Debezium CDCKafkaorders.events topicInventory WorkerNotification Worker
The business row and outbox row are written in one local transaction. A relay polls the outbox table and publishes events to Kafka, guaranteeing at-least-once delivery with no dual-write risk.

The key invariant: if the business row is committed, the outbox row is committed. If the relay restarts ten times before publishing, the event is still in the table waiting. The system can only fail by *delivering more than once*, which is solved on the consumer side with idempotency keys.

Real-world use cases

  • Order, payment and inventory services that must publish state changes to other services without losing events.
  • Audit logs where every write must produce a tamper-evident record.
  • CDC pipelines feeding analytics warehouses (BigQuery, Snowflake) from operational databases.
  • CQRS projections (see our CQRS tutorial) where the projection worker must never miss an event.

Step 1 — The outbox table

```sql
CREATE TABLE outbox (
  id           UUID PRIMARY KEY,
  aggregate_id UUID NOT NULL,
  topic        TEXT NOT NULL,
  payload      JSONB NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  sent_at      TIMESTAMPTZ
);

CREATE INDEX outbox_unsent_idx ON outbox (created_at) WHERE sent_at IS NULL; ```

The partial index keeps polling efficient even as the table grows.

Step 2 — Write business + outbox atomically

```java
@Entity @Table(name = "outbox")
@Getter @Setter @NoArgsConstructor
public class OutboxRow {
  @Id private UUID id;
  private UUID aggregateId;
  private String topic;
  @Type(JsonBinaryType.class)
  @Column(columnDefinition = "jsonb")
  private String payload;
  private Instant createdAt;
  private Instant sentAt;
}

@Service @RequiredArgsConstructor public class OrderService { private final OrderRepository orders; private final OutboxRepository outbox; private final ObjectMapper json;

@Transactional public Order place(CreateOrder cmd) throws JsonProcessingException { var order = Order.create(cmd); orders.save(order);

var row = new OutboxRow(); row.setId(UUID.randomUUID()); row.setAggregateId(order.getId()); row.setTopic("orders.events"); row.setPayload(json.writeValueAsString(new OrderPlaced(order.getId(), order.getTotal()))); row.setCreatedAt(Instant.now()); outbox.save(row);

return order; } } ```

Both inserts share the same JPA transaction. Postgres commits both or neither.

Step 3 — The polling relay

The simplest relay is a scheduled job that picks unsent rows with SKIP LOCKED so multiple instances can run safely.

```java
@Component
@RequiredArgsConstructor
public class OutboxRelay {

private final EntityManager em; private final KafkaTemplate<String, String> kafka;

@Scheduled(fixedDelay = 500) @Transactional public void flush() { List<OutboxRow> rows = em.createQuery( "SELECT o FROM OutboxRow o WHERE o.sentAt IS NULL " + "ORDER BY o.createdAt LIMIT 100", OutboxRow.class) .setLockMode(LockModeType.PESSIMISTIC_WRITE) .setHint("jakarta.persistence.lock.timeout", -2) // SKIP LOCKED .getResultList();

for (OutboxRow r : rows) { kafka.send(r.getTopic(), r.getAggregateId().toString(), r.getPayload()) .get(5, TimeUnit.SECONDS); r.setSentAt(Instant.now()); } } } ```

SKIP LOCKED (Postgres) means a second relay instance simply picks a different batch instead of blocking. The Kafka send is acknowledged before sentAt is set, so a crash mid-loop just re-publishes a few events — which is what consumers expect.

Architecture

Reliable Event Flow with Outbox

WRITEPOLLPUBLISHCONSUMEACK1234 ackAtomic Writebiz + outbox rowRelay PollerSELECT … FOR UPDATE SKIP LOCKEDKafka PublishIdempotent producerConsumerProcess · DedupeMark SentUPDATE outbox SET sent=true
Each event moves through atomic write, durable polling, idempotent publish and consumer acknowledgement. A failure at any step is retried safely because state lives in the database, not in memory.

Step 4 — The CDC variant with Debezium

For high throughput or to eliminate the relay process entirely, point Debezium at the outbox table. It tails the Postgres WAL and republishes each insert to Kafka in real time.

json
{
  "name": "outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.dbname": "orders",
    "table.include.list": "public.outbox",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.route.topic.replacement": "${routedByValue}"
  }
}

The Debezium EventRouter transform inspects the topic column on each row and routes events to the right Kafka topic. There is no application code running between Postgres and Kafka — latency drops to milliseconds and there is no scheduler to operate.

Step 5 — Idempotent consumers

Because delivery is at-least-once, consumers must dedupe. The simplest pattern is to persist the event ID on the consumer side and skip duplicates.

java
@KafkaListener(topics = "orders.events", groupId = "inventory")
@Transactional
public void on(ConsumerRecord<String, String> r) throws Exception {
  var event = json.readValue(r.value(), OrderPlaced.class);
  if (processed.existsById(event.eventId())) return;
  inventory.reserve(event.orderId(), event.lines());
  processed.save(new ProcessedEvent(event.eventId(), Instant.now()));
}

The dedupe table and the business write must be in the same transaction — otherwise you reintroduce a dual-write on the consumer side.

Production best practices

  • Partition by aggregate ID. Set the Kafka key to aggregate_id so events for one aggregate land on the same partition in order.
  • Retain or purge. Keep sent rows for 7–30 days for audit and replay, then archive or delete. A nightly DELETE job is fine.
  • Bound the outbox. Alert when unsent rows exceed a threshold; that means the relay or broker is down.
  • Use idempotent Kafka producers. Set enable.idempotence=true and acks=all so retries on the producer don't duplicate.
  • Run the relay on multiple replicas. With SKIP LOCKED, two or three instances give fault tolerance without coordination.
  • Don't put huge blobs in the outbox. Reference object storage URLs instead. Outbox rows should be small and fast to publish.
  • Embed schema version. { "schemaVersion": 1, "data": {...} }. Evolving event schemas is inevitable.

Security considerations

  • The outbox table contains the same PII as the business tables — encrypt at rest and apply the same access controls.
  • Sign events at the producer if downstream services are cross-tenant.
  • Restrict who can write to the outbox table directly. Only the application user should be able to insert; only the relay role should be able to update sent_at.
  • Audit the relay's outgoing traffic; the relay is the only path events take to Kafka.

Common mistakes

1. Publishing from a @TransactionalEventListener after commit. This is still a dual-write — the post-commit publish can fail and is not retried. 2. No SKIP LOCKED. Multiple relay instances will fight for the same rows and serialize the throughput. 3. Deleting outbox rows immediately on send. You lose the ability to replay. Mark sent and purge later. 4. Forgetting consumer idempotency. At-least-once delivery means consumers *will* see duplicates eventually. 5. Mixing many event types in one outbox without keys. Ordering guarantees are lost; partition by aggregate ID. 6. Letting the outbox grow unbounded. Polling slows, indexes bloat, vacuums struggle.

Troubleshooting guide

  • Events stuck in outbox. Check relay logs. Common causes: Kafka unreachable, topic does not exist, ACL missing. Unblock by fixing the broker, then resume.
  • Duplicate processing downstream. Verify the consumer dedupe table is in the same transaction as the business write.
  • Out-of-order events. Confirm the Kafka producer key is aggregate_id. Single-partition order is only guaranteed per key.
  • Relay falling behind. Increase batch size, add an index on (sent_at, created_at), or switch to Debezium.
  • Database CPU spike. Vacuum the outbox more aggressively; the partial index needs maintenance once you purge rows.

FAQ

1. Why not use Kafka Transactions to write directly? You still need an atomic write across two systems (DB + Kafka). Kafka Transactions only help when both sides are Kafka.

2. Polling relay or Debezium — which should I pick? Polling is fine up to a few thousand events per second and is dead simple. Debezium scales further and removes the scheduler, at the cost of operating a Connect cluster.

3. Does the Outbox pattern guarantee exactly-once? No. It guarantees at-least-once. Combine it with consumer-side dedupe for effectively-once semantics.

4. Can I use the Outbox pattern without Kafka? Yes — the relay can publish to RabbitMQ, SNS, NATS, or an HTTP webhook. The pattern is broker-agnostic.

5. How big should the outbox table get? A few hundred thousand rows is fine. Beyond that, purge or partition aggressively.

6. What about ordering across aggregates? There is no global order in Kafka. If you need cross-aggregate order, you need a different design (single-partition topic, or an event-sourced log).

7. Can multiple services share one outbox? No. Each service owns its own database, schema and outbox. That is the whole point of the bounded context.

8. Does the Outbox pattern work with NoSQL? Yes, if your database supports multi-document transactions (MongoDB ≥4.0, DynamoDB transactions). Otherwise you need a single-document outbox embedded in the aggregate.

9. How does Outbox interact with CQRS? They are complementary. CQRS needs reliable events to project; the Outbox guarantees they are emitted. See our CQRS tutorial.

10. What latency should I expect? Polling relay: 100–500ms. Debezium: 10–50ms. Both are fine for most user-facing systems.

Key takeaways

  • The dual-write problem is unavoidable when an application writes to both a database and a broker; the Outbox pattern solves it.
  • Write the business row and the outbox row in the same local transaction; let a relay (polling or Debezium) republish.
  • Partition by aggregate ID, make consumers idempotent, and purge old rows on a schedule.
  • The Outbox is the foundation for CQRS projections, audit pipelines and reliable inter-service communication.

Related tutorials

Architecture

Event-Driven Architecture

PRODUCERSBROKERCONSUMERSDATAOrderCreatedPaymentSettledOrder ServicePayment ServiceKafkaTopics · PartitionsInventory WorkerShipping WorkerNotification WorkerInventory DBShipping DBEmail API
Producers publish domain events to a durable broker. Independent consumers subscribe, react, and update their own stores asynchronously.

TL;DR

Key takeaways

  • Understand the core concepts behind The Outbox Pattern — Reliable Event Publishing in Microservices in a production context.
  • Apply the patterns to real Software Design & Architecture systems, not just toy examples.
  • Recognize the trade-offs, failure modes, and operational concerns before adopting them.
  • Get a clear path to the next step — related tutorials, tools, and reference architectures.

Avoid these

Common mistakes

  • 1. Copy-pasting code without understanding the trade-offs

    It's tempting to ship a snippet from a blog post into production, but Software Design & Architecture patterns only work when the failure modes are understood. Always reason about timeouts, retries, and consistency.

  • 2. Skipping observability from day one

    Structured logs, metrics, and traces are not optional. Wire them in before you ship — debugging Software Design & Architecture systems without them is painful and expensive.

  • 3. Optimizing too early

    Premature caching, sharding, or microservice extraction adds operational cost. Validate the bottleneck with real measurements first.

  • 4. Ignoring security defaults

    Secrets in env files, open management ports, missing RBAC — these are the most common production incidents. Treat security as part of the definition of done.

Ship it safely

Production best practices

Apply these before promoting The Outbox Pattern — Reliable Event Publishing in Microservices to a real production environment.

Scalability

Design Software Design & Architecture services to scale horizontally. Keep request handlers stateless, push session and cache state to external stores (Redis, the database), and benchmark p95/p99 latency under realistic load before tuning.

Monitoring & Observability

Emit metrics (RED/USE), structured JSON logs, and distributed traces from day one. Wire dashboards and alerts to SLOs you actually care about — error rate, latency, saturation — not vanity metrics.

Logging

Log with correlation IDs, never log secrets or PII, and centralize logs (ELK, Loki, CloudWatch). Use levels deliberately: INFO for state changes, WARN for recoverable issues, ERROR for incidents.

Security

Apply least-privilege IAM, rotate secrets through a vault, validate every input, and patch dependencies on a schedule. For HTTP services, enable TLS everywhere and set sensible security headers.

Testing

Layer unit, integration, and contract tests. Run them in CI on every PR, and add smoke tests post-deploy. For Software Design & Architecture systems, also run chaos and load tests before a major release.

Reliability & Rollouts

Ship with health checks, readiness probes, graceful shutdown, and a rollback strategy. Prefer canary or blue/green deploys over big-bang releases.

Questions

Frequently asked questions

Is this tutorial up to date?

Yes. This tutorial was last reviewed and updated on June 3, 2026. We revisit popular Software Design & Architecture tutorials regularly to keep them aligned with current best practices.

What level is this tutorial aimed at?

It is written for working developers with some backend experience. Beginners can still follow along, and senior engineers will find production-grade patterns and trade-off discussions.

Do I need to follow every step in order?

The walkthrough is sequential because each step depends on the previous one. If you only need a specific concept, the table of contents at the top of the article lets you jump straight to that section.

Where can I find the source code?

The full source code is available on GitHub: https://github.com/masterlabsystems/outbox-pattern-demo. Fork it, run it locally, and adapt it to your own project.

Go deeper

Further reading

Source Code

Get the full project on GitHub

View repo →
#Outbox Pattern#Microservices#Kafka#Spring Boot#Reliability

More From the Channel

Follow the full tutorial series on YouTube

The MasterLabSystems channel publishes in-depth, project-based tutorials on Java, Spring Boot, microservices, Docker, Kubernetes, AWS and DevOps — the same topics covered on this site, with full code walkthroughs.

Stay in the Loop

Get the next tutorial in your inbox

next tutorial →

Designing Event-Driven Microservices with Kafka and Spring Boot

Related tutorials