Unify Batch and Stream Processing with Apache Beam on AWS

One of the big visions of Apache Beam is to provide a single programming model for both batch and streaming that runs on multiple execution engines. In this session, we explore an end to end example that shows how you can combine batch and streaming aspects in one uniform Beam pipeline: We start with ingesting taxi trip events into an Amazon Kinesis data stream and use a Beam pipeline to analyze the streaming data in near real time....

June 20, 2019 · Steffen Hausmann

Amazon Kinesis Replay

A simple Java application that replays Json events that are stored in objects in Amazon S3 into a Amazon Kinesis stream. The application reads the timestamp attribute of the stored events and replays them as if they occurred in real time. https://github.com/aws-samples/amazon-kinesis-replay

May 7, 2019 · Steffen Hausmann

Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics

Stream processing facilitates the collection, processing, and analysis of real-time data and enables the continuous generation of insights and quick reactions to emerging situations. This capability is useful when the value of derived insights diminishes over time. Hence, the faster you can react to a detected situation, the more valuable the reaction is going to be. Consider, for instance, a streaming application that analyzes and blocks fraudulent credit card transactions while they occur....

April 16, 2019 · Steffen Hausmann

Amazon Kinesis Analytics Taxi Consumer

Sample Apache Flink application that can be deployed to Kinesis Analytics for Java. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to an Amazon Elasticsearch Service cluster for visualization with Kibana. https://github.com/aws-samples/amazon-kinesis-analytics-taxi-consumer

March 15, 2019 · Steffen Hausmann

Build Your First Big Data Application on AWS

AWS makes it easy to build and operate a highly scalable and flexible data platforms to collect, process, and analyze data so you can get timely insights and react quickly to new information. In this session, we will demonstrate how you can quickly build a fully managed data platform that transforms, cleans, and analyses incoming data in real time and persist the cleaned data for subsequent visualizations and through exploration by means of SQL....

February 26, 2019 · Steffen Hausmann

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS (FF)

The increasing number of available data sources in today’s application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it’s specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise....

September 13, 2017 · Steffen Hausmann

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS

In today’s business environments, data is generated in a continuous fashion by a steadily increasing number of diverse data sources. Therefore, the ability to continuously capture, store, and process this data to quickly turn high-volume streams of raw data into actionable insights has become a substantial competitive advantage for organizations. Apache Flink is an open source project that is well-suited to form the basis of such a stream processing pipeline. It offers unique capabilities that are tailored to the continuous analysis of streaming data....

April 21, 2017 · Steffen Hausmann