Netflix Observability with Kevin Lew


Manage episode 216648526 series 1437556
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

Netflix users stream terabytes of data from the cloud to their devices every day. During a high bandwidth, long-lived connection, a lot can go wrong. Networks can drop packets, machines can run out of memory, and the Netflix app on a user’s device can have a bug. All of these events can result in a bad user experience.

Other errors can occur that do not disrupt the user experience. Netflix runs thousands of machine learning jobs, logging servers, and other pieces of internal infrastructure. Customer service dashboards, CI/CD pipelines, and AB testing frameworks are all software built by Netflix–and when an error occurs in any of these places, engineers need to be able to diagnose and debug that error.

Observability is the practice of using logs, monitoring, metrics, and distributed tracing to understand how a system is working. Kevin Lew is a senior software engineer at Netflix with the Edge Insights team. He joins the show to talk about adding observability across the microservices deployed at Netflix. We also talk about how to manage high volumes of logging data effectively using stream processing.

Show Notes

The post Netflix Observability with Kevin Lew appeared first on Software Engineering Daily.

153 episodes available. A new episode about every 6 days averaging 57 mins duration .