Manage episode 216250476 series 1437556
Robinhood is a platform for buying and selling stocks, cryptocurrencies, and other assets. Since its founding in 2013, Robinhood has grown to have more than 5 million user accounts, which is even more than the popular online broker E-Trade. With the surge in user growth and transaction volume, the demands on the software infrastructure have increased significantly.
When a user buys a stock on Robinhood, that transaction gets written to Kafka and Postgres. Multiple services get notified of the new entry on the Kafka topic, and those services process that new event using Kafka Streams. Kafka Streams are a way of reading streams of data out of Kafka with exactly-once semantics. Developers at Robinhood use a variety of languages to build services on top of these Kafka streams–including Python.
Commonly used systems for building stream processing tasks on top of a Kafka topic include Apache Flink and Apache Spark. Spark and Flink let you work with large data sets while maintaining high speed and fault-tolerance. These tools are written in Java. If you want to write a Python program that interfaces with Apache Spark, you have to pay an expensive serialization/deserialization cost as you move that object between Python and Spark.
Ask Solem is an engineer with Robinhood, and the author of Faust, a stream processing library that ports the ideas of Kafka Streams to Python. Faust provides stream processing and event processing in a manner that is similar to Kafka Streams, Apache Spark, and Apache Flink. He is also the author of the popular Celery asynchronous task queue. Ask joins the show to provide his perspective on large scale, distributed stream processing, and why he created Faust.
153 episodes available. A new episode about every 6 days averaging 57 mins duration .