At Materialize, we’ve built a data warehouse that runs on real-time data. Our customers use this real-time data to power critical business use cases, from fraud detection, to dynamic pricing, to loan underwriting.
To provide our customers with streaming data, we have first-class support for loading and unloading data via Apache Kafka, the de facto standard for transit for real-time data. Because of the sensitivity of their data, our customers require strong encryption and authentication schemes at a minimum. Many of our customers go one step further and require that no data is loaded or unloaded over the public internet.
But unfortunately, Kafka and private networking do not play well together. Traditional private networking technologies like VPNs and VPC peering don’t work with Materialize’s multi-tenant architecture, and newer cloud-native technologies like AWS PrivateLink require delicate and complex reconfigurations.
As a result, the Materialize team built the first managed service that can securely connect to any Kafka cluster over AWS PrivateLink without requiring any broker configuration changes. We’ve already contributed the required changes back to the open source community. But in this blog post, we’ll take a deeper look at how we reconciled Kafka with private networking.
The post will examine why teams historically needed delicate network and broker configurations to connect to Kafka clusters. We’ll also detail how this method impacted the stability of network configurations. Then we’ll explain how we developed frictionless private networking for Kafka by using librdkafka.