Combining Transactional And Analytical Workloads On MemSQL With Nikita Shamgunov

Content provided by Tobias Macey. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tobias Macey or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Data Engineering Podcast « »
Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

5+ y ago 56:54

MP3•Episode home

Archived series ("Inactive feed" status)

When? This feed was archived on January 17, 2023 15:38 (1+ y ago). Last successful fetch was on December 12, 2022 14:50 (1+ y ago)

Why? Inactive feed status. Our servers were unable to retrieve a valid podcast feed for a sustained period.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Summary

One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade data management.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
And the team at Metis Machine has shipped a proof-of-concept integration between the Skafos machine learning platform and the Tableau business intelligence tool, meaning that your BI team can now run the machine learning models custom built by your data science team. If you think that sounds awesome (and it is) then join the free webinar with Metis Machine on October 11th at 2 PM ET (11 AM PT). Metis Machine will walk through the architecture of the extension, demonstrate its capabilities in real time, and illustrate the use case for empowering your BI team to modify and run machine learning models directly from Tableau. Go to metismachine.com/webinars now to register.
Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Your host is Tobias Macey and today I’m interviewing Nikita Shamgunov about MemSQL, a newSQL database built for simultaneous transactional and analytic workloads

Interview

Introduction
How did you get involved in the area of data management?
Can you start by describing what MemSQL is and how the product and business first got started?
What are the typical use cases for customers running MemSQL?
What are the benefits of integrating the ingestion pipeline with the database engine?
- What are some typical ways that the ingest capability is leveraged by customers?
How is MemSQL architected and how has the internal design evolved from when you first started working on it?
- Where does it fall on the axes of the CAP theorem?
- How much processing overhead is involved in the conversion from the column oriented data stored on disk to the row oriented data stored in memory?
- Can you describe the lifecycle of a write transaction?
Can you discuss the techniques that are used in MemSQL to optimize for speed and overall system performance?
- How do you mitigate the impact of network latency throughout the cluster during query planning and execution?
How much of the implementation of MemSQL is using custom built code vs. open source projects?
What are some of the common difficulties that your customers encounter when building on top of or migrating to MemSQL?
What have been some of the most challenging aspects of building and growing the technical and business implementation of MemSQL?
When is MemSQL the wrong choice for a data platform?
What do you have planned for the future of MemSQL?

Contact Info

@nikitashamgunov on Twitter
LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

354 episodes

#Data Science #Tech #Science #Tobias Macey #Big Data