Best Tobias Macey Podcasts (2025)

1
CSVs Will Never Die And OneSchema Is Counting On It 54:40

9 days ago54:40

54:40

Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the ch…

1
Breaking Down Data Silos: AI and ML in Master Data Management 57:30

19 days ago57:30

57:30

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organiz…

1
Building a Data Vision Board: A Guide to Strategic Planning 49:59

Play Pause

about a year ago49:59

49:59

Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision boar…

1
How Orchestration Impacts Data Platform Architecture 59:39

Play Pause

about a year ago59:39

59:39

Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his…

1
An Exploration Of The Impediments To Reusable Data Pipelines 51:32

Play Pause

about a year ago51:32

51:32

Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also touc…

1
The Art of Database Selection and Evolution 59:56

Play Pause

about a year ago59:56

59:56

Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design, q…

1
Bridging Code and UI in Data Orchestration with Kestra 44:30

Play Pause

about a year ago44:30

44:30

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enabl…

1
Streaming Data Into The Lakehouse With Iceberg And Trino At Going 39:49

Play Pause

about a year ago39:49

39:49

In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads th…

1
An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin 56:11

Play Pause

about a year ago56:11

56:11

Summary The challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated workflow. At the same time, there have been many approaches to how much of the workflow is driven by code vs. not. Burak Karakan is of the opinion that a fully integrated workflow that is driven entir…

1
Feldera: Bridging Batch and Streaming with Incremental Computation 47:36

Play Pause

about a year ago47:36

47:36

Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous computation of data, machine learning, and AI workloads. The discussion covers the concept of incremental computation, the origins of Feldera, and its unique ability to handle both streaming and batch …

1
Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent 48:50

Play Pause

about a year ago48:50

48:50

Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares his experiences building and scaling data platforms at companies like Autodesk and Lyft, and how these experiences inspired the creation of DataFold to address data quality issues across teams. He out…

1
Bring Vector Search And Storage To The Data Lake With Lance 58:01

Play Pause

about a year ago58:01

58:01

Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner workings of the Lance format for table definitions and file storage, …

1
The Role of Python in Shaping the Future of Data Platforms with DLT 54:08

Play Pause

about a year ago54:08

54:08

Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the P…

1
Build Your Data Transformations Faster And Safer With SDF 42:36

Play Pause

about a year ago42:36

42:36

Summary In this episode of the Data Engineering Podcast Lukas Schulte, co-founder and CEO of SDF, explores the development and capabilities of this fast and expressive SQL transformation tool. From its origins as a solution for addressing data privacy, governance, and quality concerns in modern data management, to its unique features like static an…

1
Scaling Airbyte: Challenges and Milestones on the Road to 1.0 57:11

Play Pause

about a year ago57:11

57:11

Summary Airbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the self-hosted and cloud operations, as well as the quality and stability of their connectors. As a result of that hard work, they have declared their commitment to the future of the platform with a 1.…

1
Enhancing Data Accessibility and Governance with Gravitino 38:41

Play Pause

about a year ago38:41

38:41

Summary As data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and access the underlying data. Gravitino was created to provide a single interface to locate and query your data. In this episode Junping Du explains how Gravitino works, the capabilities that it unloc…

1
The Evolution of DataOps: Insights from DataKitchen's CEO 53:30

Play Pause

about a year ago53:30

53:30

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures, the need for rapid changes, and high customer demands. Chris delves …

1
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management 49:26

Play Pause

about a year ago49:26

49:26

Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be us…

1
How Generative AI Is Impacting Data Engineering Teams 54:45

Play Pause

about a year ago54:45

54:45

Summary Generative AI has rapidly gained adoption for numerous use cases. To support those applications, organizational data platforms need to add new features and data teams have increased responsibility. In this episode Lior Gavish, co-founder of Monte Carlo, discusses the various ways that data teams are evolving to support AI powered features a…

1
The Role of Product Managers in Data-Centric Organizations 52:58

Play Pause

about a year ago52:58

52:58

Summary In this episode Praveen Gujar, Director of Product at LinkedIn, talks about the intricacies of product management for data and analytical platforms. Praveen shares his journey from Amazon to Twitter and now LinkedIn, highlighting his extensive experience in building data products and platforms, digital advertising, AI, and cloud services. H…

1
Neon: A Serverless And Developer Friendly Postgres 57:43

Play Pause

about a year ago57:43

57:43

Summary Postgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamgunov decided to makee it serverless, so that it can scale from zero to infinity. In this episode he explains the engineering involved to make that possible, as well as the numerous details that he an…

1
Improve Data Quality Through Engineering Rigor And Business Engagement With Synq 59:48

Play Pause

about a year ago59:48

59:48

Summary This episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for …

1
Stitching Together Enterprise Analytics With Microsoft Fabric 53:23

Play Pause

about a year ago53:23

53:23

Summary Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fabric platform, based on their OneLake architecture. In this episode Dipti Borkar shares her experiences working on the product team at Fabric and explains the various use cases for the Fabric service. Ann…

1
Being Data Driven At Stripe With Trino And Iceberg 53:20

Play Pause

about a year ago53:20

53:20

Summary Stripe is a company that relies on data to power their products and business. To support that functionality they have invested in Trino and Iceberg for their analytical workloads. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face i…

1
X-Ray Vision For Your Flink Stream Processing With Datorios 42:22

Play Pause

about a year ago42:22

42:22

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. To address this shortcoming Datorios created an observability platform for Flink that brings visibility to the internals of this popular stream processing system. …

1
Practical First Steps In Data Governance For Long Term Success 1:00:41

Play Pause

about a year ago1:00:41

1:00:41

Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Data governance is the binding force between these two parts of the organization. Nicola Askham found her way into data governance by accident, and stayed because of the benefit that she was able t…

1
Data Migration Strategies For Large Scale Systems 1:00:00

Play Pause

about a year ago1:00:00

1:00:00

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares …

1
Zenlytic Is Building You A Better Coworker With AI Agents 54:19

Play Pause

about a year ago54:19

54:19

Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Unfortunately this often turns into an exercise in frustration for everyone involved due to complex workflows and hard-to-understand dashboards. The team at Zenlytic have leaned on the promise of la…

1
Release Management For Data Platform Services And Logic 20:09

Play Pause

about a year ago20:09

20:09

Summary Building a data platform is a substrantial engineering endeavor. Once it is running, the next challenge is figuring out how to address release management for all of the different component parts. The services and systems need to be kept up to date, but so does the code that controls their behavior. In this episode your host Tobias Macey ref…

1
Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach 54:17

Play Pause

about a year ago54:17

54:17

Summary Artificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the appr…

1
Build Your Second Brain One Piece At A Time 50:10

Play Pause

about a year ago50:10

50:10

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collecti…

1
Making Email Better With AI At Shortwave 53:43

Play Pause

about a year ago53:43

53:43

Summary Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his …

1
Designing A Non-Relational Database Engine 1:16:02

Play Pause

about a year ago1:16:02

1:16:02

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing …

1
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer 56:23

Play Pause

about a year ago56:23

56:23

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological sol…

1
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary 50:44

Play Pause

about a year ago50:44

50:44

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technolo…

1
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+ 55:40

Play Pause

about a year ago55:40

55:40

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pe…

1
Reconciling The Data In Your Databases With Datafold 58:14

Play Pause

about a year ago58:14

58:14

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold,…

1
Version Your Data Lakehouse Like Your Software With Nessie 40:55

Play Pause

about a year ago40:55

40:55

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond…

1
When And How To Conduct An AI Program 46:25

Play Pause

about a year ago46:25

46:25

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about h…

1
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development 56:01

Play Pause

about a year ago56:01

56:01

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow …

1
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse 58:46

Play Pause

about a year ago58:46

58:46

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combinatio…

1
Data Sharing Across Business And Platform Boundaries 59:56

Play Pause

about a year ago59:56

59:56

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains …

1
Tackling Real Time Streaming Data With SQL Using RisingWave 56:55

Play Pause

about a year ago56:55

56:55

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continu…

1
Build A Data Lake For Your Security Logs With Scanner 1:02:38

Play Pause

about a year ago1:02:38

1:02:38

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying…

1
Modern Customer Data Platform Principles 1:01:33

Play Pause

about a year ago1:01:33

1:01:33

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how…

1
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel 50:26

Play Pause

about a year ago50:26

50:26

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user ex…

1
Designing Data Platforms For Fintech Companies 47:57

Play Pause

about a year ago47:57

47:57

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineer…

1
Troubleshooting Kafka In Production 1:14:44

Play Pause

about a year ago1:14:44

1:14:44

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production". In this episode he hi…

1
Adding An Easy Mode For The Modern Data Stack With 5X 56:12

Play Pause

about a year ago56:12

56:12

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X unders…

1
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack 51:18

Play Pause

about a year ago51:18

51:18

Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. And…

Podcasts Worth a Listen

Tobias Macey Podcasts

Podcasts Worth a Listen

Quick Reference Guide