Uber’s Monitoring Platform with Rob Skillington

59:31
 
Share
 

Manage episode 227174389 series 1437556
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

Uber manages the car rides for millions of people. The Uber system must remain operational 24/7, and the app involves financial transactions and the safety of passengers.

Uber infrastructure runs across thousands of server instances and produce terabytes of monitoring data. The monitoring data is used to understand the health of the software systems as well as relevant business metrics, such as driver efficiency, daily revenues, and user satisfaction.

Uber adopted the Prometheus monitoring system to manage their monitoring data. Prometheus regularly scrapes metrics across infrastructure to gather time series data about the state of everything across Uber. As the usage of Prometheus has grown within the company, Uber has had to figure out how to scale their monitoring platform.

M3 is a monitoring system built at Uber to scale Prometheus and provide a platform that can effectively scale the data storage as well as the query serving. Rob Skillington is a staff software engineer at Uber, and he joins the show to talk about monitoring at Uber–from the requirements of the system to the implementation of M3.

At Uber, M3 powers dashboards, ad-hoc queries, and alerting. M3 was open sourced to give other users access to a scalable Prometheus solution. In a previous episode with Brian Boreham, we discussed one strategy for scaling Prometheus. Today’s episode covers another scalability solution, with M3.

Show notes

The post Uber’s Monitoring Platform with Rob Skillington appeared first on Software Engineering Daily.

135 episodes available. A new episode about every 7 days averaging 58 mins duration .