Manage episode 232338057 series 1437556
FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat
Lyft generates petabytes of data. Driver and rider behavior, pricing information, the movement of cars through space; all of this data is received by Lyft’s backend services, buffered into Kafka queues, and processed by various stream processing systems.
Lyft moves the high volumes of data into a data lake for different users throughout the company to use offline. Machine learning jobs, batch jobs, streaming jobs and materialized databases can be created on top of that data lake. Druid and Superset are used for operational analytics and dashboarding.
Li Gao is a data engineer at Lyft. He joins the show to explore the different aspects of Lyft’s data platform. We also talk about the tradeoffs of streaming frameworks, and how to manage machine learning infrastructure. This episode is a great companion to our show about Uber’s data platform, and illustrates some fundamental differences in how the two ridesharing companies operate.
143 episodes available. A new episode about every 7 days averaging 57 mins duration .