Manage episode 216800463 series 1437556
Modern applications produce large numbers of events. These events can be users clicking, IoT sensors accumulating data, or log messages.
The cost of cloud storage and compute continues to drop, so engineers can afford to build applications around these high volumes of events, and a variety of tools have been developed to process them. Apache Kafka is widely used to store and queue these streams of data, and Apache Spark and Apache Flink are stream processing systems that are used to perform general purpose computations across this event stream data.
Kafka, Spark, and Flink are great general purpose tools, but there is also room for a more narrow set of distributed systems tools to support high volume event data. Apache Druid is an open source database built for high performance, read only analytic workloads. Druid has a useful combination of features for event data workloads, including a column-oriented storage system, automatic search indexing, and a horizontally scalable architecture.
Druid’s feature set allows for new types of analytics applications to be built on top of it, including search applications, dashboards, and ad-hoc analytics. Fangjin Yang is a core contributor to Druid and the CEO of Imply.io, a company that makes a storage, querying, and visualization tool build on top of Druid. He joins the show to talk about the architecture of Druid and his company Imply.
- Imply – About us
- Imply Blog
- Imply Docs | Cloud
- Imply Docs | On-prem
- The Need for Operational Analytics
- Operational Analytics in Practice
- The technology behind operational analytics
- Druid | Technology
- Druid: Powering Interactive Data Applications at Scale – by Fangjin Yang – YouTube
135 episodes available. A new episode about every 7 days averaging 58 mins duration .