IBM paid $11B for Confluent. 90% of enterprises adopt EDA. Kafka 4.0, Flink 2.0, and the Streamhouse vision are reshaping data infrastructure.
IBM paid $11 billion for Confluent. The deal closed on March 17, 2026. Not a database company. Not an AI startup. A company whose core product is Apache Kafka — a message broker. When a $180 billion enterprise pays that kind of money for streaming infrastructure, the question "should we use event-driven architecture?" is settled.
Over 90% of global enterprise organizations will have adopted at least some form of event-driven architecture by the end of 2026, according to Gartner. 72% already have. The remaining holdouts aren't debating whether to stream. They're figuring out how to unify streaming and batch into a single architecture — and that's a fundamentally different problem.
The numbers tell a clear story. 63% of organizations report improved scalability after adopting EDA. 52% see fewer production incidents with decoupled services. EDA response times are 19.18% faster than API-driven architectures, with 34.40% lower error rates.
But the real signal isn't in survey data. It's in what companies are actually building:
These aren't experimental deployments. This is the production backbone of the internet's biggest companies.
Citi Bank scaled from thousands to over 8 million records in 18 months on their EDA platform. Wix runs 1,500 microservices on event streaming, having gradually migrated from request-reply to EDA over several years. Even XPENG Motors shifted to event-driven data pipelines, reducing streaming costs by over 50%.
The pattern spans fintech, e-commerce, automotive, and entertainment. EDA isn't a startup trend. It's enterprise infrastructure.
Apache Kafka 4.0, released March 2025, finally removed ZooKeeper entirely. KRaft is the default consensus mechanism. This matters more than it sounds.
ZooKeeper was Kafka's single biggest operational headache. It required a separate cluster, separate monitoring, separate expertise. It struggled past 100,000 partitions. KRaft scales to 1.5 million partitions without performance issues.
Kafka 4.0 also introduced early access to Queues for Kafka (KIP-932) — traditional queue semantics on top of the Kafka protocol. This is Kafka saying: we're not just a pub-sub system anymore. We're the universal messaging layer.
The ZooKeeper removal also neutralized Redpanda's biggest selling point. For years, Redpanda's pitch was "Kafka without ZooKeeper, rewritten in C++." Now Kafka itself has no ZooKeeper. The competitive landscape shifted overnight.
Despite Kafka's dominance, 2026 has three serious contenders — each optimizing for different trade-offs.
WarpStream (acquired by Confluent in 2024) took a radical approach: stateless agents backed by object storage (S3). No local disks. No inter-AZ replication. No broker state to manage.
The cost savings are dramatic. WarpStream claims 80-85% lower cost than self-hosted Kafka — roughly 4x less total cost of ownership on equivalent workloads (6 instances vs ~24 for Kafka).
The trade-off? Latency. WarpStream's p99 write latency is ~400-600ms on S3 Standard, dropping to ~100-150ms with S3 Express One Zone. That's fine for logging and analytics. Not fine for real-time fraud detection.
Robinhood's migration tells the story perfectly. With 14 million monthly active users and 10+ TB of data processed daily, they moved their logging pipeline from Kafka to WarpStream. Results: 45% total cost savings, 99% network cost reduction, and auto-scaling that matches their cyclical stock-market-hours workloads.
Redpanda rewrote Kafka in C++ with no JVM and no ZooKeeper. Their performance claims are aggressive: up to 10x lower tail latencies and 1 GB/s throughput on 3 instances where Kafka needed 9.
The caveats: independent benchmarks from Confluent's Jack Vanlightly found that Kafka surpassed Redpanda in some tests. Results vary by workload. Redpanda's edge is real for latency-sensitive use cases, but it's not the 10x blowout their marketing suggests in all scenarios.
Post-Kafka 4.0, Redpanda's strongest arguments are raw latency, ARM efficiency, single-binary simplicity, and developer experience — not ZooKeeper avoidance.
AutoMQ is the emerging dark horse. It's a fork of Apache Kafka with a new storage engine on object storage — 100% Kafka protocol compatible, but with claims of up to 17x lower cost and 100x faster elasticity with second-level partition migration. XPENG Motors reduced Kafka costs by over 50% after switching to AutoMQ.
| Feature | Kafka 4.0 | WarpStream | Redpanda | AutoMQ |
|---|---|---|---|---|
| Latency | Low ms | 400-600ms (S3) | Sub-ms tail | Low ms |
| Cost vs Kafka | Baseline | 80-85% less | 3-6x less | Up to 17x less |
| Storage | Local disk | Object storage | Local disk | Object storage |
| Compatibility | Native | Kafka protocol | Kafka API | 100% Kafka fork |
| Scaling | KRaft (1.5M partitions) | Stateless auto-scale | Manual | Auto-scale (seconds) |
| Best For | General purpose | Cost-sensitive, bursty | Low-latency critical | Cost + compatibility |
The pattern is clear: the Kafka ecosystem is fragmenting along the cost-latency spectrum. WarpStream and AutoMQ trade latency for cost efficiency. Redpanda trades compatibility for raw speed. Kafka 4.0 remains the safe default.
If Kafka is the nervous system of EDA, Apache Flink is the brain. And Flink 2.0, released March 2025, followed by Flink 2.2 in December — described as "the biggest leap since Flink 1.0" — fundamentally changed what's possible.
The headline feature: disaggregated state management. Flink's new ForSt state backend (an LSM-tree key-value store based on RocksDB) stores SST files on remote file systems like S3 or HDFS. State size is now limited only by external storage, not local disk.
The performance numbers are surprising. Disaggregated state with just 1GB of cache achieves 75-120% throughput compared to traditional local state — even under constrained caching conditions. Recovery time is now independent of state size because there's no need to download state during recovery.
Confluent's managed Flink offering reached low eight-figure ARR in ~18 months since GA, with 1,000+ customers. Alibaba processes 40 billion events per day on Flink. The question isn't whether Flink is production-ready. It's whether anything else can compete.
Flink 2.0 also removed the entire DataSet API, added native AI/ML inference in SQL, and introduced Process Table Functions bridging SQL and DataStream. This is Flink doubling down on being the unified compute engine for streaming and batch.
The competitor landscape is thinner than you'd think. RisingWave claims to outperform Flink in 22 out of 27 Nexmark queries — but RisingWave is a PostgreSQL-compatible streaming database, not a general-purpose stream processor. It's a different tool for a different job. For stateful stream processing at scale, Flink is the default. The VLDB paper "Disaggregated State Management in Apache Flink 2.0" formalized this as academic consensus.
Here's the architectural shift that matters most. The industry is moving from Lambda Architecture (separate batch + stream pipelines) to Kappa Architecture (stream-only) to something new: the Streamhouse.
The concept, introduced by Ververica in 2023, combines real-time streaming capabilities with lakehouse flexibility. Think of it as "Lakehouse 2.0" — seamless integration of streaming and batch within a single unified architecture.
The technology stack making this real:
Apache Paimon — a streaming-first lakehouse table format (formerly "Flink Table Store"). It uses LSM-tree file organization to unify batch and stream processing with native CDC support, incremental queries, and deep Flink integration.
Apache Fluss (incubating) — a columnar streaming storage engine with sub-second query latency. Fluss holds the most recent writes (sub-second freshness), while Paimon serves longer-term data (minute-level latency). A tiering service continuously moves data from Fluss into Paimon tables, creating a "tiered streaming lakehouse."
Confluent's Tableflow — bridges Kafka topics to Apache Iceberg tables, enabling batch analytics tools to query streaming data natively.
The vision: your events flow through Kafka, get processed by Flink, land in Paimon/Iceberg tables, and are queryable in real-time and batch — all without maintaining separate pipelines. Ververica Platform 3.0, released for Azure customers in 2026, calls this "the turning point for unified streaming data."
This is where "How do we unify?" becomes the central question. Not whether to stream, but how to make streaming and batch the same thing.
I need to say something unpopular: most teams shouldn't use event sourcing.
CQRS (Command Query Responsibility Segregation) separates read and write models. Event sourcing stores every state change as an immutable event. Together, they're powerful — Netflix uses both for 260+ million subscribers. But they're also a complexity trap.
Microsoft's own documentation warns: "CQRS can introduce significant complexity into the application design, specifically when combined with the Event Sourcing pattern." The Wix engineering team, running 1,500 microservices, explicitly recommends CDC (Change Data Capture) over full event sourcing for most use cases.
Here's when event sourcing makes sense:
Here's when it doesn't:
The Wix team learned this the hard way. Their recommendation: use CDC to capture database changes as events. You get the event stream without the complexity of reconstructing state from events. It's pragmatic. It works.
Wix's engineering team published five hard-won lessons from scaling EDA across 1,500 microservices. Every team adopting EDA should read these:
1. The Atomicity Problem. Writing to your database and publishing to Kafka is not atomic. If the DB write succeeds but the Kafka publish fails (or vice versa), your system is inconsistent. The fix: use the Outbox pattern or CDC. Write events to a database table, then use a CDC connector to stream them to Kafka.
2. Event Sourcing Is Probably Too Complex. Wix tried full event sourcing and pulled back. Reconstructing state from thousands of events is expensive. Schema evolution on events is painful because events are long-term contracts. CDC gives you 80% of the benefit with 20% of the complexity.
3. Context Propagation Is Hard. In request-reply systems, context flows naturally through the call chain. In async event-driven systems, context (user ID, trace ID, request metadata) gets lost between services. You need to propagate context explicitly in every event envelope.
4. Large Payloads Kill Your Bus. Streaming large payloads (images, documents, large JSON blobs) through your event bus creates bottlenecks. Put large data in object storage. Put a reference (URL, ID) in the event.
5. Idempotency Isn't Optional. Events may be delivered more than once. Every consumer must be idempotent. If processing the same event twice produces a different result, you have a bug. Use idempotency keys, deduplication tables, or idempotent operations (SET vs INCREMENT).
Here's a problem nobody warns you about until you're in production: schema evolution.
Events are contracts. When Service A publishes an OrderCreated event, Services B, C, and D all depend on its structure. Changing that structure — adding fields, removing fields, renaming fields — requires coordinating across every consumer.
ING Bank published a case study on enforcing backward compatibility across thousands of event types in their payments platform. Their approach: strong schema registries, backward-compatible-only changes, and explicit versioning strategies.
The practical advice:
OrderCreated.v1, OrderCreated.v2. Consumers choose which version they understand.Use this framework. Be honest with yourself.
Start with the basics:
# docker-compose.yml — Minimal EDA setup for local development
services:
kafka:
image: confluentinc/cp-kafka:7.9.0
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
connect:
image: debezium/connect:2.7
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
This gives you Kafka 4.0 (KRaft mode, no ZooKeeper) and Debezium for CDC — the two components that cover 80% of EDA use cases.
Go event-driven if:
Stay with request-reply if:
The cost reality check: Running a production Kafka cluster isn't free. You need brokers, monitoring (Prometheus + Grafana or Confluent Control Center), a schema registry, and someone who understands consumer group rebalancing at 3am. Budget $2,000-$5,000/month minimum for a small production cluster on AWS, or use managed services like Confluent Cloud (usage-based pricing) or Amazon MSK ($0.21/hour per broker). WarpStream can cut this by 80% for latency-tolerant workloads.
The hybrid approach (what most successful teams do):
About 40% of businesses say educating non-technical stakeholders on EDA benefits is a major adoption hurdle. Start small. Prove value. Expand.
The 2026 EDA story isn't about adoption — that's settled. It's about unification.
For five years, teams maintained separate batch and streaming pipelines. Spark for batch. Flink for streaming. Different code, different infrastructure, different operational models. Lambda Architecture was the pattern name, but "paying twice for everything" was the reality.
The Streamhouse vision — Flink + Paimon + Iceberg as a unified compute-and-storage layer — is the first architecture that credibly promises to end this duplication. It's early. Paimon and Fluss are still maturing. But the direction is obvious.
IBM paying $11 billion for Confluent confirms that streaming is now infrastructure, not a feature. Kafka is to event-driven systems what PostgreSQL is to relational data — the default that everything else is measured against. The alternatives (WarpStream, Redpanda, AutoMQ) aren't trying to kill Kafka. They're trying to be better Kafka for specific workloads.
The mistake I see teams make most often: adopting EDA everywhere because it's "modern." If your system is 15 CRUD endpoints and a dashboard, you don't need Kafka. You don't need event sourcing. You don't need CQRS. You need a PostgreSQL database and some REST APIs. EDA solves real problems at scale, but if you use it for systems that don't require that, you'll pay all the complexity costs for nothing.
Start with CDC on your existing database. That's it. Debezium streaming your PostgreSQL changes to a Kafka topic gives you 80% of EDA benefits with 10% of the complexity. You can always add Flink processing, event sourcing, and CQRS later — when the domain complexity justifies it.
The teams that win in 2026 aren't the ones with the most sophisticated streaming architecture. They're the ones who matched their architecture to their actual complexity. Sometimes that's Kafka + Flink + Paimon processing 40 billion events a day. Sometimes it's a PostgreSQL trigger and a cron job. The hard part isn't building the streaming pipeline. It's knowing when you actually need one.
And for the record: if your "event-driven architecture" is just HTTP webhooks with a retry queue, that's fine. That counts. Not everything needs Kafka. Ship the product.