Artwork

Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

28:06
 
Share
 

Manage episode 428522571 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater.

Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack.

The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools.

Key Quotes:

  • Find the right level of abstraction when building data orchestration tasks/workflows. "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.”
  • Modularize data pipeline components: "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular." Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability.
  • Adopt a streaming/event-driven architecture for low-latency AI use cases: "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices."

Hugo Lu:

Nicolay Gerold:

00:00 Introduction to Orchestra and its Focus on Data Products

08:03 Unified Control Plane for Data Stack and End-to-End Control

14:42 Use Cases and Unique Applications of Orchestra

19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra

22:23 Event-Driven Architectures and Monitoring in Orchestra

23:49 Putting Data Products First and Monitoring Health and Usage

25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective

data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads

  continue reading

19 episodes

Artwork
iconShare
 
Manage episode 428522571 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater.

Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack.

The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools.

Key Quotes:

  • Find the right level of abstraction when building data orchestration tasks/workflows. "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.”
  • Modularize data pipeline components: "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular." Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability.
  • Adopt a streaming/event-driven architecture for low-latency AI use cases: "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices."

Hugo Lu:

Nicolay Gerold:

00:00 Introduction to Orchestra and its Focus on Data Products

08:03 Unified Control Plane for Data Stack and End-to-End Control

14:42 Use Cases and Unique Applications of Orchestra

19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra

22:23 Event-Driven Architectures and Monitoring in Orchestra

23:49 Putting Data Products First and Monitoring Health and Usage

25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective

data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads

  continue reading

19 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide