Powering real-time loan underwriting at Vontive with Materialize
In the fast-paced world of mortgage lending, speed and accuracy are crucial. To support their underwriters, Vontive transformed written rules for loan eligibility from a Google Doc into SQL queries for evaluation in a Postgres database. However, while functional, this setup struggled to scale with business growth, resulting in slow, cumbersome processing times. Executing just a handful of loan eligibility rules could take up to 27 seconds–far too long for user-friendly interactions....
How Materialize Unlocks Private Kafka Connectivity via PrivateLink and SSH
At Materialize, we’ve built a data warehouse that runs on real-time data. Our customers use this real-time data to power critical business use cases, from fraud detection, to dynamic pricing, to loan underwriting. To provide our customers with streaming data, we have first-class support for loading and unloading data via Apache Kafka, the de facto standard for transit for real-time data. Because of the sensitivity of their data, our customers require strong encryption and authentication schemes at a minimum....
Navigating Private Network Connectivity Options for Kafka Clusters
There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options … might not be an option! In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet....
A Beginner’s Guide to Kafka Performance in Cloud Environments
Over time, deploying and running Kafka became easier and easier. Today you can choose amongst a large ecosystem of different managed offerings or just deploy to Kubernetes directly. But, although you have plenty of options to optimize your Kafka configuration and choose infrastructure that matches your use case and budget, it’s not always easy to tell how these choices affect overall cluster performance. In this session, we’ll take a look at Kafka performance from an infrastructure perspective....
Everything you need to know to be a Materialize power-user
This post is also available on the Materialize blog. Materialize is a distributed SQL database built on streaming internals. With it, you can use the SQL you are already familiar with to build powerful stream processing capabilities. But as with any abstraction, sometimes the underlying implementation details leak through the abstraction. Queries that look simple and innocent when you are formulating them in SQL can sometimes require more resources than expected when evaluated incrementally against a continuous stream of arriving updates....
Leaving Amazon
After more than 7.5 years my time at AWS came to a close at the end of 2022. It’s been an incredible journey to learn and grow professionally. I’m still surprised how much trust and support I’ve received over the years to focus on things I found important and impactful. Just last year the work I’ve started to improve the Apache Flink connectors system was contributed back to the open source project, not only resulting in several blog posts and a session at Flink Forward, but also getting early adoption that lead to support of new destinations that now integrate with Apache Flink....
Making it Easier to Build Connectors with Apache Flink: Introducing the Async Sink
Apache Flink is a popular open source framework for stateful computations over data streams. It allows you to formulate queries that are continuously evaluated in near real time against an incoming stream of events. To persist derived insights from these queries in downstream systems, Apache Flink comes with a rich connector ecosystem that supports a wide range of sources and destinations. However, the existing connectors may not always be enough to support all conceivable use cases....
One sink to rule them all: Introducing the new Async Sink
Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics....
Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost
Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how to size your clusters to meet your throughput, availability, and latency requirements. Along the way, we answer questions like “when does it make sense to scale up vs....
Performance Testing Framework for Apache Kafka
The tool is designed to evaluate the maximum throughput of a cluster and compare the put latency of different broker, producer, and consumer configurations. To run a test, you basically specify the different parameters that should be tested and the tool will iterate through all different combinations of the parameters, producing a graph similar to the one below. https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/