Flink and BEAM Stream Processing with Maximilian Michels

51:14
 
Share
 

Manage episode 253228541 series 1418007
By Software Engineering Daily. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.

Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams.

These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of data, but instead of processing a calculation over a single large batch of data, they process data on an ongoing basis. There are so many different stream processing system for this same use case–Storm, Spark, Flink, Heron, and many others.

Why is that? When there seems to be much more consolidation around the Hadoop MapReduce batch processing technology, why are there so many stream processing systems?

One explanation is that aggregating the results of a continuous stream of data is a process that very much depends on time. At any given point in time, you can take a snapshot of the stream of data, and any calculation based on that data is going to be out of date by the time that your calculation is finished. There is a latency between when you start calculating something, and when you finish calculating it.

There are other design decisions for a distributed stream processing system. What data do you keep in memory? What do you keep on disk? How often do you snapshot your data to disk? What is the method for fault tolerance? What are the APIs for consuming and processing this data?

Maximilian Michels has worked on the Apache Flink and Apache BEAM stream processing systems, and currently works on data infrastructure at Lyft. Max joins the show to discuss the tradeoffs of different stream processing systems and his experiences in the world of data processing.

You can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Flink and BEAM Stream Processing with Maximilian Michels appeared first on Software Engineering Daily.

1206 episodes