Data Warehouse with Christian Kleinerman


Manage episode 218981569 series 1437556
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

A data warehouse provides fast access to large data sets for analytics, data science, and dashboards. A data warehouse differs from a transactional database, because you often do not need to update specific records. Because of the read-only nature of the access patterns, and the high volumes of data being queried, the design of a data warehouse is very different than a transactional database.

With a transactional database (such as MySQL or MongoDB), it is important to have consistency guarantees. For example, consider a transactional database that serves as the backend for banking applications. If multiple frontend servers are hitting that transactional database to withdraw money, you need the records to be quickly updated. You need to avoid race conditions, so that two servers cannot withdraw the entire bank account balance simultaneously from different locations.

In contrast to transactional databases, a data warehouse is often used to process a query that encompasses a big data set. For example, Netflix might want to answer the question: “how many users that watched House of Cards also watched Black Mirror?” Netflix has a lot of users, so they will want to be accessing those user records in a way that lets them scan through the records quickly.

Christian Kleinerman is the VP of product at Snowflake Computing. Snowflake’s main product is a cloud data warehouse. In today’s show, we talk about the difference between a data warehouse, a data lake, and a transactional database, and the process of moving data sets between them, often known as ETL.

This show continues our series on data engineering and data platforms. As companies accumulate more and more data, the complexity of managing that data and taking full advantage of it is escalating. Christian gives his perspective on these changing trends, and describes the plans for Snowflake to evolve as a business.

Show Notes

The post Data Warehouse with Christian Kleinerman appeared first on Software Engineering Daily.

153 episodes available. A new episode about every 6 days averaging 57 mins duration .