Hi there 👋

I’m Steffen. I’m passionate about streaming and database technologies. With a background spanning startups, AWS, and academic research, I’m dedicated to empowering teams to turn real-time data into actionable insights. Explore my collection of blogs, talks, and open-source contributions to dive deeper into the world of streaming and database technologies.

Powering real-time loan underwriting at Vontive with Materialize

In the fast-paced world of mortgage lending, speed and accuracy are crucial. To support their underwriters, Vontive transformed written rules for loan eligibility from a Google Doc into SQL queries for evaluation in a Postgres database. However, while functional, this setup struggled to scale with business growth, resulting in slow, cumbersome processing times. Executing just a handful of loan eligibility rules could take up to 27 seconds–far too long for user-friendly interactions....

October 8, 2024 · Steffen Hausmann

How Materialize Unlocks Private Kafka Connectivity via PrivateLink and SSH

At Materialize, we’ve built a data warehouse that runs on real-time data. Our customers use this real-time data to power critical business use cases, from fraud detection, to dynamic pricing, to loan underwriting. To provide our customers with streaming data, we have first-class support for loading and unloading data via Apache Kafka, the de facto standard for transit for real-time data. Because of the sensitivity of their data, our customers require strong encryption and authentication schemes at a minimum....

June 10, 2024 · Steffen Hausmann

Navigating Private Network Connectivity Options for Kafka Clusters

There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options … might not be an option! In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet....

March 20, 2024 · Steffen Hausmann

A Beginner’s Guide to Kafka Performance in Cloud Environments

Over time, deploying and running Kafka became easier and easier. Today you can choose amongst a large ecosystem of different managed offerings or just deploy to Kubernetes directly. But, although you have plenty of options to optimize your Kafka configuration and choose infrastructure that matches your use case and budget, it’s not always easy to tell how these choices affect overall cluster performance. In this session, we’ll take a look at Kafka performance from an infrastructure perspective....

May 16, 2023 · Steffen Hausmann

Everything you need to know to be a Materialize power-user

This post is also available on the Materialize blog. Materialize is a distributed SQL database built on streaming internals. With it, you can use the SQL you are already familiar with to build powerful stream processing capabilities. But as with any abstraction, sometimes the underlying implementation details leak through the abstraction. Queries that look simple and innocent when you are formulating them in SQL can sometimes require more resources than expected when evaluated incrementally against a continuous stream of arriving updates....

April 20, 2023 · Steffen Hausmann

Leaving Amazon

After more than 7.5 years my time at AWS came to a close at the end of 2022. It’s been an incredible journey to learn and grow professionally. I’m still surprised how much trust and support I’ve received over the years to focus on things I found important and impactful. Just last year the work I’ve started to improve the Apache Flink connectors system was contributed back to the open source project, not only resulting in several blog posts and a session at Flink Forward, but also getting early adoption that lead to support of new destinations that now integrate with Apache Flink....

March 1, 2023 · Steffen Hausmann

Making it Easier to Build Connectors with Apache Flink: Introducing the Async Sink

Apache Flink is a popular open source framework for stateful computations over data streams. It allows you to formulate queries that are continuously evaluated in near real time against an incoming stream of events. To persist derived insights from these queries in downstream systems, Apache Flink comes with a rich connector ecosystem that supports a wide range of sources and destinations. However, the existing connectors may not always be enough to support all conceivable use cases....

November 23, 2022 · Steffen Hausmann

One sink to rule them all: Introducing the new Async Sink

Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics....

August 3, 2022 · Steffen Hausmann

Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost

Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how to size your clusters to meet your throughput, availability, and latency requirements. Along the way, we answer questions like “when does it make sense to scale up vs....

March 14, 2022 · Steffen Hausmann

Performance Testing Framework for Apache Kafka

The tool is designed to evaluate the maximum throughput of a cluster and compare the put latency of different broker, producer, and consumer configurations. To run a test, you basically specify the different parameters that should be tested and the tool will iterate through all different combinations of the parameters, producing a graph similar to the one below. https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/

March 7, 2022 · Steffen Hausmann