Best Tobias Macey Podcasts (2024)

1
Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach 54:16

19h ago54:16

54:16

Summary Artificial intelligence has dominated the headlines for several months due to the successes of large language models. This has prompted numerous debates about the possibility of, and timeline for, artificial general intelligence (AGI). Peter Voss has dedicated decades of his life to the pursuit of truly intelligent software through the appr…

1
Build Your Second Brain One Piece At A Time 50:10

8d ago50:10

50:10

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collecti…

1
Making Email Better With AI At Shortwave 53:43

14d ago53:43

53:43

Summary Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his …

1
Designing A Non-Relational Database Engine 1:16:01

22d ago1:16:01

1:16:01

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing …

1
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer 56:23

28d ago56:23

56:23

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological sol…

1
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary 50:44

1M ago50:44

50:44

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technolo…

1
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+ 55:39

1M ago55:39

55:39

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pe…

1
Reconciling The Data In Your Databases With Datafold 58:14

2M ago58:14

58:14

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold,…

1
Version Your Data Lakehouse Like Your Software With Nessie 40:55

2M ago40:55

40:55

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond…

1
When And How To Conduct An AI Program 46:25

2M ago46:25

46:25

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about h…

1
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development 56:00

2M ago56:00

56:00

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow …

1
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse 58:46

3M ago58:46

58:46

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combinatio…

1
Data Sharing Across Business And Platform Boundaries 59:55

3M ago59:55

59:55

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains …

1
Tackling Real Time Streaming Data With SQL Using RisingWave 56:55

3M ago56:55

56:55

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continu…

1
Build A Data Lake For Your Security Logs With Scanner 1:02:38

3M ago1:02:38

1:02:38

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying…

1
Modern Customer Data Platform Principles 1:01:33

4M ago1:01:33

1:01:33

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how…

1
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel 50:26

4M ago50:26

50:26

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user ex…

1
Designing Data Platforms For Fintech Companies 47:56

4M ago47:56

47:56

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineer…

1
Troubleshooting Kafka In Production 1:14:43

4M ago1:14:43

1:14:43

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production". In this episode he hi…

1
Adding An Easy Mode For The Modern Data Stack With 5X 56:12

5M ago56:12

56:12

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X unders…

1
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack 51:17

5M ago51:17

51:17

Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. And…

1
Designing Data Transfer Systems That Scale 1:03:57

5M ago1:03:57

1:03:57

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in …

1
Addressing The Challenges Of Component Integration In Data Platform Architectures 29:42

5M ago29:42

29:42

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is faci…

1
Unlocking Your dbt Projects With Practical Advice For Practitioners 1:16:04

6M ago1:16:04

1:16:04

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects…

1
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine 1:07:52

6M ago1:07:52

1:07:52

Summary Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yaha…

1
Shining Some Light In The Black Box Of PostgreSQL Performance 54:51

6M ago54:51

54:51

Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doi…

1
Surveying The Market Of Database Products 47:12

6M ago47:12

47:12

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she…

1
Defining A Strategy For Your Data Products 1:03:50

7M ago1:03:50

1:03:50

Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the develop…

1
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable 1:08:28

7M ago1:08:28

1:08:28

Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems …

1
Using Data To Illuminate The Intentionally Opaque Insurance Industry 51:58

7M ago51:58

51:58

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements He…

1
Building ETL Pipelines With Generative AI 51:36

7M ago51:36

51:36

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with…

1
Powering Vector Search With Real Time And Incremental Vector Indexes 59:16

7M ago59:16

59:16

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data. Annou…

1
Building Linked Data Products With JSON-LD 1:01:30

8M ago1:01:30

1:01:30

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for buildi…

1
An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem 1:01:25

8M ago1:01:25

1:01:25

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this ep…

1
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library 42:12

8M ago42:12

42:12

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to elim…

1
Building An Internal Database As A Service Platform At Cloudflare 1:01:09

8M ago1:01:09

1:01:09

Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low lat…

1
Harnessing Generative AI For Creating Educational Content With Illumidesk 54:52

9M ago54:52

54:52

Summary Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating e…

1
Unpacking The Seven Principles Of Modern Data Pipelines 47:02

9M ago47:02

47:02

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your dat…

1
Quantifying The Return On Investment For Your Data Team 1:01:52

9M ago1:01:52

1:01:52

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your compa…

1
Strategies For A Successful Data Platform Migration 1:09:52

9M ago1:09:52

1:09:52

Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that …

1
Build Real Time Applications With Operational Simplicity Using Dozer 40:42

10M ago40:42

40:42

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite that, it is still a complex set of capabilities. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. In this episode he explains how investing in high performance…

1
Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future 54:45

10M ago54:45

54:45

Summary Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on d…

1
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling 1:12:54

10M ago1:12:54

1:12:54

Summary For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow…

1
How Data Engineering Teams Power Machine Learning With Feature Platforms 1:03:29

10M ago1:03:29

1:03:29

Summary Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems th…

1
Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh 50:19

11M ago50:19

50:19

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful en…

1
How Column-Aware Development Tooling Yields Better Data Models 46:19

11M ago46:19

46:19

Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process…

1
Build Better Tests For Your dbt Projects With Datafold And data-diff 48:21

11M ago48:21

48:21

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt pr…

1
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service 54:05

11M ago54:05

54:05

Summary A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, …

1
A Roadmap To Bootstrapping The Data Team At Your Startup 42:31

11M ago42:31

42:31

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his h…

1
Keep Your Data Lake Fresh With Real Time Streams Using Estuary 55:50

12M ago55:50

55:50

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streamin…

Podcasts Worth a Listen

Tobias Macey Podcasts

Podcasts Worth a Listen

Quick Reference Guide