Manage episode 223788771 series 1437556
At Facebook, Venkat Venkataramani saw how large volumes of data were changing software infrastructure. Applications such as logging servers and advertising were creating fast moving, semi-structured data. The user base was growing, the traffic was growing, and the volume of data was growing. And the popular methods for managing this data were insufficient for the applications that developers wanted to build on top.
In previous episodes about data platforms, we have covered similar difficulties as experienced by Uber and Doordash. Incoming data is often in JSON, which is hard to query. The data is transformed to a file format like Parquet, which requires an ETL job. Once it is in a Parquet file on disk in a data lake, the access time is slow. To query the data efficiently, it must be loaded into a data warehouse, which loads the data into memory, often in a columnar format that is easy to aggregate.
Imagine being a developer at Facebook, Uber, or Doordash, and trying to build a simple dashboard, or a machine learning application on top of this data platform. Where do you find the right data? How do you know it is up to date? And what if you don’t know the shape of your queries ahead of time, and you haven’t defined indexes over your data? The access speed will be too slow to do exploratory analysis.
There are so many steps in this process, and each of these steps creates friction for application developers that want to build on top of “big data”. Since even Facebook was having trouble managing this problem of the data platform, Venkat figured there was an opportunity to build a company around solving the data platform for other software companies.
Venkat is the CEO of Rockset, a data system that is built to make it easy for developers to build data-driven apps. In Rockset, data can be ingested from data streams, data lakes, and databases. Rockset creates multiple indexes and schemas across the data. Because there are multiple models for querying, Rockset can analyze an incoming query and create an intelligent query plan for serving it.
Venkat joins the show to discuss his time working on data at Facebook, the untapped opportunities of using that data, and the architecture of Rockset.
153 episodes available. A new episode about every 6 days averaging 57 mins duration .