avatar

Steffen's Blog

Field Engineer @ Materialize

Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost

Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how to size your clusters to meet your throughput, availability, and latency requirements. Along the way, we answer questions like “when does it make sense to scale up vs.

Performance Testing Framework for Apache Kafka

The tool is designed to evaluate the maximum throughput of a cluster and compare the put latency of different broker, producer, and consumer configurations. To run a test, you basically specify the different parameters that should be tested and the tool will iterate through all different combinations of the parameters, producing a graph similar to the one below. https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/

Flink Improvement Proposal 171: Async Sink

Apache Flink has a rich connector ecosystem that can persist data in various destinations. Flink natively supports Apache Kafka, Amazon Kinesis Data Streams, Elasticsearch, HBase, and many more destinations. Additional connectors are maintained in Apache Bahir or directly on GitHub. The basic functionality of these sinks is quite similar. They batch events according to user defined buffering hints, sign requests and send them to the respective endpoint, retry unsuccessful or throttled requests, and participate in checkpointing.

Building real-time applications using Apache Flink

Build real-time applications using Apache Flink with Apache Kafka and Amazon Kinesis Data Streams. Apache Flink is a framework and engine for building streaming applications for use cases such as real-time analytics and complex event processing. This session covers best practices for building low-latency applications with Apache Flink when reading data from either Amazon MSK or Amazon Kinesis Data Streams. It also covers best practices for running low-latency Apache Flink applications using Amazon Kinesis Data Analytics and discusses AWS’s open-source contributions to this use case.

Build a Unified Batch and Stream Processing Pipeline with Apache Beam on AWS

In this workshop, we explore an end to end example that combines batch and streaming aspects in one uniform Beam pipeline. We start to analyze incoming taxi trip events in near real time with an Apache Beam pipeline. We then show how to archive the trip data to Amazon S3 for long term storage. We subsequently explain how to read the historic data from S3 and backfill new metrics by executing the same Beam pipeline in a batch fashion.