DataOps For Streaming Systems With Lenses.io

Data Engineering Podcast

Content provided by Tobias Macey. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tobias Macey or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

4y ago 45:36

MP3•Episode home

Archived series ("Inactive feed" status)

When? This feed was archived on January 17, 2023 15:38 (1+ y ago). Last successful fetch was on December 12, 2022 14:50 (1+ y ago)

Why? Inactive feed status. Our servers were unable to retrieve a valid podcast feed for a sustained period.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Summary

There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface for your data, and the metrics that need to be tracked to keep the overall system healthy. Observability and governance of streaming data requires a different approach than batch oriented workflows, and this episode does an excellent job of outlining the complexities involved and how to address them.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt.
You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
Your host is Tobias Macey and today I’m interviewing Andrew Stevenson about Lenses.io, a platform to provide real-time data operations for engineers

Interview

Introduction
How did you get involved in the area of data management?
Can you start by describing what Lenses is and the story behind it?
What is your working definition for what constitutes DataOps?
- How does the Lenses platform support the cross-cutting concerns that arise when trying to bridge the different roles in an organization to deliver value with data?
  - What are the typical barriers to collaboration, and how does Lenses help with that?
Many different systems provide a SQL interface to streaming data on various substrates. What was your reason for building your own SQL engine and what is unique about it?
What are the main challenges that you see engineers facing when working with streaming systems?
What have you found to be the most notable evolutions in the community and ecosystem around Kafka and streaming platforms?
One of the interesting features in the recent release is support for topologies to map out the relations between different producers and consumers across a stream. Why is that a difficult problem and how have you approached it?
On the point of monitoring, what are the foundational challenges that engineers run into when trying to gain visibility into streams of data?
- What are some useful strategies for collecting and analyzing traces of data flows?
As with many things in the space of data, local development and pre-production testing and validation are complicated due to the potential scale and variability of a production system. What advice do you have for engineers who are trying to establish a sustainable workflow for streaming applications?
- How do you facilitate the CI/CD process for enabling a culture of testing and establishing confidence in the correct functionality of your systems?
How is the Lenses platform implemented and how has its design evolved since you first began working on it?
What are some of the specifics of Kafka that you have had to reconsider or redesign as you began adding support for additional streaming engines (e.g. Redis and Pulsar)?
What are some of the most interesting, unexpected, or innovative ways that you have seen the Lenses platform used?
What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with Lenses?
When is Lenses the wrong choice?
What do you have planned for the future of the platform?

Contact Info

LinkedIn
@StevensonA_D on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

354 episodes

#Data Science #Tech #Science #Tobias Macey #Big Data