Manage episode 245748339 series 1418007
A large social network needs to develop systems for ingesting, storing, and processing large volumes of data.
Data engineering at scale requires multiple engineering teams that are responsible for different areas of the infrastructure.
Data needs to be structured coherently in order to minimize the data cleaning process. Machine learning models need to be developed, deployed, and iterated on at scale. Areas of the company which produce data need to be decoupled from the areas of the company which consume data, so that engineers throughout the company can reliably build tools on top of these large data sets.
In our previous episodes about LinkedIn, we covered two major components of LinkedIn’s data engineering systems: the Kafka infrastructure and the LinkedIn data platform used by engineers to productively build data applications.
Kapil Surlaker is a senior director of engineering at LinkedIn, and he joins the show to discuss the bigger picture of LinkedIn’s data infrastructure. Kapil works with teams across LinkedIn to understand the requirements for the products and internal tools, and translate those requirements into team structures and software platforms that let LinkedIn use data more productively.
We discuss a wide range of topics, including engineering management, the modern data platform, and LinkedIn’s adoption of public cloud.
Full disclosure: LinkedIn is a sponsor of Software Engineering Daily.
Sponsorship inquiries: email@example.com
Check out our active projects:
- We are hiring a head of growth. If you like Software Engineering Daily and consider yourself competent in sales, marketing, and strategy, send me an email: firstname.lastname@example.org
- FindCollabs is a place to build open source software.
- The SEDaily app for iOS and Android includes all 1000 of our old episodes, as well as related links, greatest hits, and topics. Subscribe for ad-free episodes.