Apache Spark, the Next Generation Cluster Computing with Ivan Lozic

Manage episode 183169356 series 1465274
By Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio streamed directly from their servers.

In this episode, I talked to Ivan Lozic about Apache Spark. Ivan owns a master’s degree in information technology and has been working at Farmeron, a cloud based dairy farm management software. In his role as the software architect he’s been in charge of the Big Data architecture and technology stack in order to be able to process ever larger data sets the company has been processing.

Apache Spark is a general computing engine designed for large-scale data processing. It is becoming ever more popular thanks to the support from the Apache community. Many well-known companies use it to process petabytes of data on 8000+ nodes with long running jobs measured in weeks. In this session, Ivan talks about: Apache Spark and how it relates to (traditional) Hadoop MapReduce technology, What makes Spark so fast How to use its rich API’s to design and run your ETL jobs. Apache Spark streaming capabilities for near real-time updates and its role in Big Data processing scenarios. Structured Streaming, a scalable and fault tolerant stream processing engine which makes near real-time processing scenarios easier.

If you'd like to watch the recording of this webinar, or be notified of upcoming webinars, please register at http://www.prohuddle.com.

Now let's hear from Ivan.

17 episodes available. A new episode about every 12 days averaging 92 mins duration .