Lyft’s Data Platform with Li Gao


Manage episode 232338057 series 1437556
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

FindCollabs is a company I started recently

The FindCollabs Podcast is out!

FindCollabs is hiring a React developer

FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat

We are booking sponsorships for Q3, find more details at

Podsheets is our open source set of tools for managing podcasts and podcast businesses

New version of Software Daily, our app and ad-free subscription service

Lyft generates petabytes of data. Driver and rider behavior, pricing information, the movement of cars through space; all of this data is received by Lyft’s backend services, buffered into Kafka queues, and processed by various stream processing systems.

Lyft moves the high volumes of data into a data lake for different users throughout the company to use offline. Machine learning jobs, batch jobs, streaming jobs and materialized databases can be created on top of that data lake. Druid and Superset are used for operational analytics and dashboarding.

Li Gao is a data engineer at Lyft. He joins the show to explore the different aspects of Lyft’s data platform. We also talk about the tradeoffs of streaming frameworks, and how to manage machine learning infrastructure. This episode is a great companion to our show about Uber’s data platform, and illustrates some fundamental differences in how the two ridesharing companies operate.

The post Lyft’s Data Platform with Li Gao appeared first on Software Engineering Daily.

143 episodes available. A new episode about every 7 days averaging 57 mins duration .