Steffen's Blog
Field Engineer @ Materialize
Stream processing facilitates the collection, processing, and analysis of real-time data and enables the continuous generation of insights and quick reactions to emerging situations. Yet, despite these advantages compared to traditional batch-oriented analytics applications, streaming applications are much more challenging to operate. Some of these challenges include the ability to provide and maintain low end-to-end latency, to seamlessly recover from failure, and to deal with a varying amount of throughput.
Sample Apache Beam pipeline that can be deployed to Kinesis Data Analytics for Java Applications. It reads taxi events from a Kinesis data stream, processes and aggregates them, and ingests the result to Amazon CloudWatch for visualization.
https://github.com/aws-samples/amazon-kinesis-analytics-beam-taxi-consumer
In this workshop, you will build an end-to-end streaming architecture to ingest, analyze, and visualize streaming data in near real-time. You set out to improve the operations of a taxi company in New York City. You’ll analyze the telemetry data of a taxi fleet in New York City in near-real time to optimize their fleet operations.
You will not only learn how to deploy, operate, and scale an Apache Flink application with Kinesis Data Analytics for Java Applications, but also explore the basic concepts of Apache Flink and running Flink applications in a fully managed environment on AWS.
One of the big visions of Apache Beam is to provide a single programming model for both batch and streaming that runs on multiple execution engines.
In this session, we explore an end to end example that shows how you can combine batch and streaming aspects in one uniform Beam pipeline: We start with ingesting taxi trip events into an Amazon Kinesis data stream and use a Beam pipeline to analyze the streaming data in near real time.
A simple Java application that replays Json events that are stored in objects in Amazon S3 into a Amazon Kinesis stream. The application reads the timestamp attribute of the stored events and replays them as if they occurred in real time.
https://github.com/aws-samples/amazon-kinesis-replay