Steffen's Blog
Field Engineer @ Materialize
Build real-time applications using Apache Flink with Apache Kafka and Amazon Kinesis Data Streams. Apache Flink is a framework and engine for building streaming applications for use cases such as real-time analytics and complex event processing. This session covers best practices for building low-latency applications with Apache Flink when reading data from either Amazon MSK or Amazon Kinesis Data Streams. It also covers best practices for running low-latency Apache Flink applications using Amazon Kinesis Data Analytics and discusses AWS’s open-source contributions to this use case.
In this workshop, we explore an end to end example that combines batch and streaming aspects in one uniform Beam pipeline. We start to analyze incoming taxi trip events in near real time with an Apache Beam pipeline. We then show how to archive the trip data to Amazon S3 for long term storage. We subsequently explain how to read the historic data from S3 and backfill new metrics by executing the same Beam pipeline in a batch fashion.
This post looks at how to use Apache Flink as a basis for sophisticated streaming extract-transform-load (ETL) pipelines. Apache Flink is a framework and distributed processing engine for processing data streams. AWS provides a fully managed service for Apache Flink through Amazon Kinesis Data Analytics, which enables you to build and run sophisticated streaming applications quickly, easily, and with low operational overhead.
https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics/
In this chalk talk, we discuss the benefits of different AWS streaming services and walk through some use cases for each. We share best practices based on real customer examples and discuss a framework that you can use to determine which set of services best suit your specific use case. Finally, we show some interactive examples, so come ready with your real-life scenarios that we can discuss live.
https://d1.awsstatic.com/events/reinvent/2019/Choosing_the_right_service_for_your_data_streaming_needs_ANT316.pdf
In this session, we walk through how to perform real-time analytics on ride-sharing and taxi data, and we explore how to build a reliable, scalable, and highly available streaming architecture based on managed services. You learn how to deploy, operate, and scale an Apache Flink application with Amazon Kinesis Data Analytics for Java applications. Leave this workshop knowing how to build an end-to-end streaming analytics pipeline, starting with ingesting data into a Kinesis data stream, writing and deploying a Flink application to perform basic stream transformations and aggregations, and persisting the results to Amazon Elasticsearch Service to be visualized from Kibana.