Manage episode 230951434 series 1437556
Distributed stream processing allows developers to build applications on top of large sets of data that are being rapidly created. Stream processing is often described as an alternative to batch processing. In batch processing, a single large computation is performed over a large, static data set. In stream processing, a computation is performed repeatedly and continuously over a data set that is being appended to.
A stream is often stored in a distributed queue such as Kafka, Kinesis, Pulsar, or Google PubSub. A stream is often processed with a stream processing tool such as Spark, Flink, Storm, or Google Cloud Dataflow.
Holden Karau is an engineer who works on open source projects at Google. She returns to the show to describe the state of stream processing and discuss modern best practices.
140 episodes available. A new episode about every 7 days averaging 57 mins duration .