Druid Analytical Database with Fangjin Yang


Manage episode 216800463 series 1437556
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

Modern applications produce large numbers of events. These events can be users clicking, IoT sensors accumulating data, or log messages.

The cost of cloud storage and compute continues to drop, so engineers can afford to build applications around these high volumes of events, and a variety of tools have been developed to process them. Apache Kafka is widely used to store and queue these streams of data, and Apache Spark and Apache Flink are stream processing systems that are used to perform general purpose computations across this event stream data.

Kafka, Spark, and Flink are great general purpose tools, but there is also room for a more narrow set of distributed systems tools to support high volume event data. Apache Druid is an open source database built for high performance, read only analytic workloads. Druid has a useful combination of features for event data workloads, including a column-oriented storage system, automatic search indexing, and a horizontally scalable architecture.

Druid’s feature set allows for new types of analytics applications to be built on top of it, including search applications, dashboards, and ad-hoc analytics. Fangjin Yang is a core contributor to Druid and the CEO of Imply.io, a company that makes a storage, querying, and visualization tool build on top of Druid. He joins the show to talk about the architecture of Druid and his company Imply.

Show Notes

The post Druid Analytical Database with Fangjin Yang appeared first on Software Engineering Daily.

135 episodes available. A new episode about every 7 days averaging 58 mins duration .