Artwork

Content provided by SNIA Technical Council. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by SNIA Technical Council or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

#122: 10 Million I/Ops From a Single Thread

50:11
 
Share
 

Manage episode 257391932 series 1393477
Content provided by SNIA Technical Council. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by SNIA Technical Council or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
One of the most common benchmarks in the storage industry is 4KiB random read I/O per second. Over the years, the industry first saw the publication of 1M I/Ops on a single box, then 1M I/Ops on a single thread (by SPDK). More recently, there have been publications outlining 10M I/Ops on a single box using high performance NVMe devices and more than 100 CPU cores. This talk will present a benchmark of SPDK performing more than 10 million random 4KiB read operations per second from a single thread to 20 NVMe devices, a large advance compared to the state of the art of the industry. SPDK has developed a number of novel techniques to reach this level of performance, which will be outlined in detail here. These techniques include polling, advanced MMIO doorbell batching strategies, PCIe and DDIO considerations, careful management of the CPU cache, and the use of non-temporal CPU instructions. This will be a low level talk with real examples of eliminating data dependent loads, profiling last level cache misses, pre-fetching, and more. Additionally, there remains a number of techniques that have not yet been employed that warrant future research. These techniques often push devices outside of their original intended operating mode, while remaining within the bounds of the specification, and so often require collaboration between NVMe controller and device designers, the NVMe specification body, and software developers such as the SPDK team. Learning Objectives: 1) Optimal use of NVMe devices; 2) Optimal use of PCIe and MMIO in a storage stack; 3) Leveraging advanced x86-64 CPU instructions and making best use of the CPU cache.
  continue reading

146 episodes

Artwork
iconShare
 
Manage episode 257391932 series 1393477
Content provided by SNIA Technical Council. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by SNIA Technical Council or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
One of the most common benchmarks in the storage industry is 4KiB random read I/O per second. Over the years, the industry first saw the publication of 1M I/Ops on a single box, then 1M I/Ops on a single thread (by SPDK). More recently, there have been publications outlining 10M I/Ops on a single box using high performance NVMe devices and more than 100 CPU cores. This talk will present a benchmark of SPDK performing more than 10 million random 4KiB read operations per second from a single thread to 20 NVMe devices, a large advance compared to the state of the art of the industry. SPDK has developed a number of novel techniques to reach this level of performance, which will be outlined in detail here. These techniques include polling, advanced MMIO doorbell batching strategies, PCIe and DDIO considerations, careful management of the CPU cache, and the use of non-temporal CPU instructions. This will be a low level talk with real examples of eliminating data dependent loads, profiling last level cache misses, pre-fetching, and more. Additionally, there remains a number of techniques that have not yet been employed that warrant future research. These techniques often push devices outside of their original intended operating mode, while remaining within the bounds of the specification, and so often require collaboration between NVMe controller and device designers, the NVMe specification body, and software developers such as the SPDK team. Learning Objectives: 1) Optimal use of NVMe devices; 2) Optimal use of PCIe and MMIO in a storage stack; 3) Leveraging advanced x86-64 CPU instructions and making best use of the CPU cache.
  continue reading

146 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide