<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Steffen Hausmann</title>
    <link>https://steffen.hausmann.info/</link>
    <description>Recent content on Steffen Hausmann</description>
    <generator>Hugo -- 0.146.0</generator>
    <language>en</language>
    <lastBuildDate>Tue, 14 Jan 2025 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://steffen.hausmann.info/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>How to Simplify Microservices with a Shared Database and Materialized Views</title>
      <link>https://steffen.hausmann.info/posts/how-to-simplify-microservices-with-a-shared-database-and-materialized-views/</link>
      <pubDate>Tue, 14 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/how-to-simplify-microservices-with-a-shared-database-and-materialized-views/</guid>
      <description>&lt;p&gt;This &lt;a href=&#34;https://materialize.com/blog/simplify-microservices-shared-database-materialized-views/&#34;&gt;blog post&lt;/a&gt; was both enjoyable and quick to write, at least according to my standards. It explores a slightly provocative idea: challenging the fundamental assumption that microservices must not share a database to expose data to external services, and examining what breaks when this convention is ignored. As it turns out, quite a lot breaks, but there are also significant benefits, especially when business logic requires consistent data from multiple services.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Powering real-time loan underwriting at Vontive with Materialize</title>
      <link>https://steffen.hausmann.info/posts/powering-real-time-loan-underwriting-at-vontive-with-materialize/</link>
      <pubDate>Tue, 08 Oct 2024 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/powering-real-time-loan-underwriting-at-vontive-with-materialize/</guid>
      <description>&lt;p&gt;In the fast-paced world of mortgage lending, speed and accuracy are crucial. To support their underwriters, Vontive transformed written rules for loan eligibility from a Google Doc into SQL queries for evaluation in a Postgres database. However, while functional, this setup struggled to scale with business growth, resulting in slow, cumbersome processing times. Executing just a handful of loan eligibility rules could take up to 27 seconds–far too long for user-friendly interactions.&lt;/p&gt;</description>
    </item>
    <item>
      <title>How Materialize Unlocks Private Kafka Connectivity via PrivateLink and SSH</title>
      <link>https://steffen.hausmann.info/posts/how-materialize-unlocks-private-kafka-connectivity-via-privatelink-and-ssh/</link>
      <pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/how-materialize-unlocks-private-kafka-connectivity-via-privatelink-and-ssh/</guid>
      <description>&lt;p&gt;At Materialize, we’ve built a data warehouse that runs on real-time data. Our customers use this real-time data to power critical business use cases, from fraud detection, to dynamic pricing, to loan underwriting.&lt;/p&gt;
&lt;p&gt;To provide our customers with streaming data, we have first-class support for loading and unloading data via Apache Kafka, the de facto standard for transit for real-time data. Because of the sensitivity of their data, our customers require strong encryption and authentication schemes at a minimum. Many of our customers go one step further and require that no data is loaded or unloaded over the public internet.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Navigating Private Network Connectivity Options for Kafka Clusters</title>
      <link>https://steffen.hausmann.info/posts/navigating-private-network-connectivity-options-for-kafka-clusters/</link>
      <pubDate>Wed, 20 Mar 2024 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/navigating-private-network-connectivity-options-for-kafka-clusters/</guid>
      <description>&lt;p&gt;There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options &amp;hellip; might not be an option!&lt;/p&gt;
&lt;p&gt;In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A Beginner’s Guide to Kafka Performance in Cloud Environments</title>
      <link>https://steffen.hausmann.info/posts/a-beginners-guide-to-kafka-performance-in-cloud-environments/</link>
      <pubDate>Tue, 16 May 2023 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/a-beginners-guide-to-kafka-performance-in-cloud-environments/</guid>
      <description>&lt;p&gt;Over time, deploying and running Kafka became easier and easier. Today you can choose amongst a large ecosystem of different managed offerings or just deploy to Kubernetes directly. But, although you have plenty of options to optimize your Kafka configuration and choose infrastructure that matches your use case and budget, it’s not always easy to tell how these choices affect overall cluster performance.&lt;/p&gt;
&lt;p&gt;In this session, we’ll take a look at Kafka performance from an infrastructure perspective. How does your choice of storage, compute, and networking affect cluster throughput? How can you optimize for low cost or fast recovery? When is it better to scale up rather than to scale out brokers?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Everything you need to know to be a Materialize power-user</title>
      <link>https://steffen.hausmann.info/posts/everything-you-need-to-know-to-be-a-materialize-power-user/</link>
      <pubDate>Thu, 20 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/everything-you-need-to-know-to-be-a-materialize-power-user/</guid>
      <description>&lt;p&gt;This post is also available on the &lt;a href=&#34;https://materialize.com/blog/power-user/&#34;&gt;Materialize blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Materialize is a distributed SQL database built on streaming internals. With it, you can use the SQL you are already familiar with to build powerful stream processing capabilities. But as with any abstraction, sometimes the underlying implementation details leak through the abstraction. Queries that look simple and innocent when you are formulating them in SQL can sometimes require more resources than expected when evaluated incrementally against a continuous stream of arriving updates.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Leaving Amazon</title>
      <link>https://steffen.hausmann.info/posts/leaving-amazon/</link>
      <pubDate>Wed, 01 Mar 2023 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/leaving-amazon/</guid>
      <description>&lt;p&gt;After more than 7.5 years my time at AWS came to a close at the end of 2022. It&amp;rsquo;s been an incredible journey to learn and grow professionally.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m still surprised how much trust and support I&amp;rsquo;ve received over the years to focus on things I found important and impactful. Just last year the work I&amp;rsquo;ve started to improve the Apache Flink connectors system was contributed back to the open source project, not only resulting in several blog posts and a session at Flink Forward, but also getting early adoption that lead to support of new destinations that now integrate with Apache Flink. I&amp;rsquo;ve also spent a ridiculous amount of energy and time on understanding Apache Kafka performance in cloud environments, which not only discovered several opportunities for internal improvements, but also led to one of the most popular blog posts on the AWS big data blog in 2022. Throughout 2022 I’ve also started building my own team within the messaging and streaming organization with the goal to enable and support customers adopting streaming technologies on AWS.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Making it Easier to Build Connectors with Apache Flink: Introducing the Async Sink</title>
      <link>https://steffen.hausmann.info/posts/making-it-easier-to-build-connectors-with-apache-flink-introducing-the-async-sink/</link>
      <pubDate>Wed, 23 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/making-it-easier-to-build-connectors-with-apache-flink-introducing-the-async-sink/</guid>
      <description>&lt;p&gt;Apache Flink is a popular open source framework for stateful computations over data streams. It allows you to formulate queries that are continuously evaluated in near real time against an incoming stream of events. To persist derived insights from these queries in downstream systems, Apache Flink comes with a rich connector ecosystem that supports a wide range of sources and destinations. However, the existing connectors may not always be enough to support all conceivable use cases. Our customers and the community kept asking for more connectors and better integrations with various open source tools and services.&lt;/p&gt;</description>
    </item>
    <item>
      <title>One Sink to Rule Them All: Introducing the New Async Sink</title>
      <link>https://steffen.hausmann.info/posts/one-sink-to-rule-them-all-introducing-the-new-async-sink/</link>
      <pubDate>Wed, 03 Aug 2022 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/one-sink-to-rule-them-all-introducing-the-new-async-sink/</guid>
      <description>&lt;p&gt;Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost</title>
      <link>https://steffen.hausmann.info/posts/best-practices-for-right-sizing-your-apache-kafka-clusters-to-optimize-performance-and-cost/</link>
      <pubDate>Mon, 14 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/best-practices-for-right-sizing-your-apache-kafka-clusters-to-optimize-performance-and-cost/</guid>
      <description>&lt;p&gt;Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost.&lt;/p&gt;
&lt;p&gt;This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how to size your clusters to meet your throughput, availability, and latency requirements. Along the way, we answer questions like “when does it make sense to scale up vs. scale out?” We end with guidance on how to continuously verify the size of your production clusters.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Performance Testing Framework for Apache Kafka</title>
      <link>https://steffen.hausmann.info/posts/performance-testing-framework-for-apache-kafka/</link>
      <pubDate>Mon, 07 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/performance-testing-framework-for-apache-kafka/</guid>
      <description>&lt;p&gt;The tool is designed to evaluate the maximum throughput of a cluster and compare the put latency of different broker, producer, and consumer configurations. To run a test, you basically specify the different parameters that should be tested and the tool will iterate through all different combinations of the parameters, producing a graph similar to the one below.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/&#34;&gt;https://github.com/aws-samples/performance-testing-framework-for-apache-kafka/&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Flink Improvement Proposal 171: Async Sink </title>
      <link>https://steffen.hausmann.info/posts/flink-improvement-proposal-171-async-sink/</link>
      <pubDate>Wed, 09 Jun 2021 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/flink-improvement-proposal-171-async-sink/</guid>
      <description>&lt;p&gt;Apache Flink has a rich connector ecosystem that can persist data in various destinations. Flink natively supports Apache Kafka, Amazon Kinesis Data Streams, Elasticsearch, HBase, and many more destinations. Additional connectors are maintained in Apache Bahir or directly on GitHub. The basic functionality of these sinks is quite similar. They batch events according to user defined buffering hints, sign requests and send them to the respective endpoint, retry unsuccessful or throttled requests, and participate in checkpointing. They primarily just differ in the way they interface with the destination. Yet, all the above-mentioned sinks are developed and maintained independently.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building real-time applications using Apache Flink</title>
      <link>https://steffen.hausmann.info/posts/building-real-time-applications-using-apache-flink/</link>
      <pubDate>Thu, 10 Dec 2020 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/building-real-time-applications-using-apache-flink/</guid>
      <description>&lt;p&gt;Build real-time applications using Apache Flink with Apache Kafka and Amazon Kinesis Data Streams. Apache Flink is a framework and engine for building streaming applications for use cases such as real-time analytics and complex event processing. This session covers best practices for building low-latency applications with Apache Flink when reading data from either Amazon MSK or Amazon Kinesis Data Streams. It also covers best practices for running low-latency Apache Flink applications using Amazon Kinesis Data Analytics and discusses AWS’s open-source contributions to this use case.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build a Unified Batch and Stream Processing Pipeline with Apache Beam on AWS</title>
      <link>https://steffen.hausmann.info/posts/build-a-unified-batch-and-stream-processing-pipeline-with-apache-beam-on-aws/</link>
      <pubDate>Wed, 26 Aug 2020 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-a-unified-batch-and-stream-processing-pipeline-with-apache-beam-on-aws/</guid>
      <description>&lt;p&gt;In this workshop, we explore an end to end example that combines batch and streaming aspects in one uniform Beam pipeline. We start to analyze incoming taxi trip events in near real time with an Apache Beam pipeline. We then show how to archive the trip data to Amazon S3 for long term storage. We subsequently explain how to read the historic data from S3 and backfill new metrics by executing the same Beam pipeline in a batch fashion. Along the way, you also learn how you can deploy and execute the Beam pipeline with Amazon Kinesis Data Analytics in a fully managed environment.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics</title>
      <link>https://steffen.hausmann.info/posts/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/</link>
      <pubDate>Fri, 21 Feb 2020 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/</guid>
      <description>&lt;p&gt;This post looks at how to use Apache Flink as a basis for sophisticated streaming extract-transform-load (ETL) pipelines. Apache Flink is a framework and distributed processing engine for processing data streams. AWS provides a fully managed service for Apache Flink through Amazon Kinesis Data Analytics, which enables you to build and run sophisticated streaming applications quickly, easily, and with low operational overhead.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/&#34;&gt;https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Choosing the right service for your data streaming needs (ANT316)</title>
      <link>https://steffen.hausmann.info/posts/choosing-the-right-service-for-your-data-streaming-needs-ant316/</link>
      <pubDate>Wed, 04 Dec 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/choosing-the-right-service-for-your-data-streaming-needs-ant316/</guid>
      <description>&lt;p&gt;In this chalk talk, we discuss the benefits of different AWS streaming services and walk through some use cases for each. We share best practices based on real customer examples and discuss a framework that you can use to determine which set of services best suit your specific use case. Finally, we show some interactive examples, so come ready with your real-life scenarios that we can discuss live.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://d1.awsstatic.com/events/reinvent/2019/Choosing_the_right_service_for_your_data_streaming_needs_ANT316.pdf&#34;&gt;https://d1.awsstatic.com/events/reinvent/2019/Choosing_the_right_service_for_your_data_streaming_needs_ANT316.pdf&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build real-time analytics for a ride-sharing app (ANT401)</title>
      <link>https://steffen.hausmann.info/posts/build-real-time-analytics-for-a-ride-sharing-app-ant401/</link>
      <pubDate>Mon, 02 Dec 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-real-time-analytics-for-a-ride-sharing-app-ant401/</guid>
      <description>&lt;p&gt;In this session, we walk through how to perform real-time analytics on ride-sharing and taxi data, and we explore how to build a reliable, scalable, and highly available streaming architecture based on managed services. You learn how to deploy, operate, and scale an Apache Flink application with Amazon Kinesis Data Analytics for Java applications. Leave this workshop knowing how to build an end-to-end streaming analytics pipeline, starting with ingesting data into a Kinesis data stream, writing and deploying a Flink application to perform basic stream transformations and aggregations, and persisting the results to Amazon Elasticsearch Service to be visualized from Kibana.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics (FF)</title>
      <link>https://steffen.hausmann.info/posts/build-and-run-streaming-applications-with-apache-flink-and-amazon-kinesis-data-analytics-ff/</link>
      <pubDate>Tue, 08 Oct 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-and-run-streaming-applications-with-apache-flink-and-amazon-kinesis-data-analytics-ff/</guid>
      <description>&lt;p&gt;Stream processing facilitates the collection, processing, and analysis of real-time data and enables the continuous generation of insights and quick reactions to emerging situations. Yet, despite these advantages compared to traditional batch-oriented analytics applications, streaming applications are much more challenging to operate. Some of these challenges include the ability to provide and maintain low end-to-end latency, to seamlessly recover from failure, and to deal with a varying amount of throughput.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Amazon Kinesis Analytics Beam Taxi Consumer</title>
      <link>https://steffen.hausmann.info/posts/amazon-kinesis-analytics-beam-taxi-consumer/</link>
      <pubDate>Thu, 20 Jun 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/amazon-kinesis-analytics-beam-taxi-consumer/</guid>
      <description>&lt;p&gt;Sample Apache Beam pipeline that can be deployed to Kinesis Data Analytics for Java Applications. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to Amazon CloudWatch for visualization.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/aws-samples/amazon-kinesis-analytics-beam-taxi-consumer&#34;&gt;https://github.com/aws-samples/amazon-kinesis-analytics-beam-taxi-consumer&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Streaming Analytics Workshop</title>
      <link>https://steffen.hausmann.info/posts/streaming-analytics-workshop/</link>
      <pubDate>Thu, 20 Jun 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/streaming-analytics-workshop/</guid>
      <description>&lt;p&gt;In this workshop, you will build an end-to-end streaming architecture to ingest, analyze, and visualize streaming data in near real-time. You set out to improve the operations of a taxi company in New York City. You’ll analyze the telemetry data of a taxi fleet in New York City in near-real time to optimize their fleet operations.&lt;/p&gt;
&lt;p&gt;You will not only learn how to deploy, operate, and scale an Apache Flink application with Kinesis Data Analytics for Java Applications, but also explore the basic concepts of Apache Flink and running Flink applications in a fully managed environment on AWS.v&lt;/p&gt;</description>
    </item>
    <item>
      <title>Unify Batch and Stream Processing with Apache Beam on AWS</title>
      <link>https://steffen.hausmann.info/posts/unify-batch-and-stream-processing-with-apache-beam-on-aws/</link>
      <pubDate>Thu, 20 Jun 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/unify-batch-and-stream-processing-with-apache-beam-on-aws/</guid>
      <description>&lt;p&gt;One of the big visions of Apache Beam is to provide a single programming model for both batch and streaming that runs on multiple execution engines.&lt;/p&gt;
&lt;p&gt;In this session, we explore an end to end example that shows how you can combine batch and streaming aspects in one uniform Beam pipeline: We start with ingesting taxi trip events into an Amazon Kinesis data stream and use a Beam pipeline to analyze the streaming data in near real time. We then show how to archive the trip data to Amazon S3 and how we can extend and update the Beam pipeline to generate additional metrics from the streaming data moving forward. We subsequently explain how to backfill the added metrics by executing the same Beam pipeline in a batch fashion against the archived data in S3. Along the way we furthermore discuss how to leverage different execution engines, such as, Amazon Kinesis Data Analytics for Java and Amazon Elastic Map Reduce, to run Beam pipelines in a fully managed environment.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Amazon Kinesis Replay</title>
      <link>https://steffen.hausmann.info/posts/amazon-kinesis-replay/</link>
      <pubDate>Tue, 07 May 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/amazon-kinesis-replay/</guid>
      <description>&lt;p&gt;A simple Java application that replays Json events that are stored in objects in Amazon S3 into a Amazon Kinesis stream. The application reads the timestamp attribute of the stored events and replays them as if they occurred in real time.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/aws-samples/amazon-kinesis-replay&#34;&gt;https://github.com/aws-samples/amazon-kinesis-replay&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics</title>
      <link>https://steffen.hausmann.info/posts/build-and-run-streaming-applications-with-apache-flink-and-amazon-kinesis-data-analytics/</link>
      <pubDate>Tue, 16 Apr 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-and-run-streaming-applications-with-apache-flink-and-amazon-kinesis-data-analytics/</guid>
      <description>&lt;p&gt;Stream processing facilitates the collection, processing, and analysis of real-time data and enables the continuous generation of insights and quick reactions to emerging situations. This capability is useful when the value of derived insights diminishes over time. Hence, the faster you can react to a detected situation, the more valuable the reaction is going to be. Consider, for instance, a streaming application that analyzes and blocks fraudulent credit card transactions while they occur. Compare that application to a traditional batch-oriented approach that identifies fraudulent transactions at the end of every business day and generates a nice report for you to read the next morning.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Amazon Kinesis Analytics Taxi Consumer</title>
      <link>https://steffen.hausmann.info/posts/amazon-kinesis-analytics-taxi-consumer/</link>
      <pubDate>Fri, 15 Mar 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/amazon-kinesis-analytics-taxi-consumer/</guid>
      <description>&lt;p&gt;Sample Apache Flink application that can be deployed to Kinesis Analytics for Java. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to an Amazon Elasticsearch Service cluster for visualization with Kibana.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/aws-samples/amazon-kinesis-analytics-taxi-consumer&#34;&gt;https://github.com/aws-samples/amazon-kinesis-analytics-taxi-consumer&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build Your First Big Data Application on AWS</title>
      <link>https://steffen.hausmann.info/posts/build-your-first-big-data-application-on-aws/</link>
      <pubDate>Tue, 26 Feb 2019 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-your-first-big-data-application-on-aws/</guid>
      <description>&lt;p&gt;AWS makes it easy to build and operate a highly scalable and flexible data platforms to collect, process, and analyze data so you can get timely insights and react quickly to new information. In this session, we will demonstrate how you can quickly build a fully managed data platform that transforms, cleans, and analyses incoming data in real time and persist the cleaned data for subsequent visualizations and through exploration by means of SQL. To this end, we will build an end-to-end streaming data solution using Kinesis Data Streams for data ingestion, Kinesis Data Analytics for real-time outlier and hotspot detection, and show how the incoming data can be persisted by means of Kinesis Data Firehose to make it available for Amazon Athena and Amazon QuickSight for data exploration and visualization.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build a Real-time Stream Processing Pipeline with Apache Flink on AWS (FF)</title>
      <link>https://steffen.hausmann.info/posts/build-a-real-time-stream-processing-pipeline-with-apache-flink-on-aws-ff/</link>
      <pubDate>Wed, 13 Sep 2017 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-a-real-time-stream-processing-pipeline-with-apache-flink-on-aws-ff/</guid>
      <description>&lt;p&gt;The increasing number of available data sources in today&amp;rsquo;s application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it&amp;rsquo;s specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Build a Real-time Stream Processing Pipeline with Apache Flink on AWS</title>
      <link>https://steffen.hausmann.info/posts/build-a-real-time-stream-processing-pipeline-with-apache-flink-on-aws/</link>
      <pubDate>Fri, 21 Apr 2017 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/posts/build-a-real-time-stream-processing-pipeline-with-apache-flink-on-aws/</guid>
      <description>&lt;p&gt;In today’s business environments, data is generated in a continuous fashion by a steadily increasing number of diverse data sources. Therefore, the ability to continuously capture, store, and process this data to quickly turn high-volume streams of raw data into actionable insights has become a substantial competitive advantage for organizations.&lt;/p&gt;
&lt;p&gt;Apache Flink is an open source project that is well-suited to form the basis of such a stream processing pipeline. It offers unique capabilities that are tailored to the continuous analysis of streaming data. However, building and maintaining a pipeline based on Flink often requires considerable expertise, in addition to physical resources and operational efforts.&lt;/p&gt;</description>
    </item>
    <item>
      <title>About me</title>
      <link>https://steffen.hausmann.info/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://steffen.hausmann.info/about/</guid>
      <description>&lt;p&gt;👋 I’m Steffen. I’m a Deployed Engineer at LangChain, helping customers ship reliable AI applications by turning prototypes into production systems.&lt;/p&gt;
&lt;p&gt;Before LangChain, I spent years helping teams build and operate high-scale data systems, most recently as a Field Engineer at Materialize and before that as a Principal Streaming Architect at Amazon Web Services working with Apache Flink and Apache Kafka. My technical roots go back to a PhD on Complex Event Processing at the University of Munich, and I’ve shared what I’ve learned as a speaker at Flink Forward, Kafka Summit, and AWS re:Invent. I also enjoy creating practical technical workshops, blog posts, and documentation. To stay hands-on, I contribute to open source (including LangChain, Materialize, and Apache Flink) and explore topics like deep learning and query optimization. When I’m not tinkering with code, I’m trying to lure my daughters into tech with cute stickers I’m collecting along the way.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
