Flash Memory Summit 2015 Special – NVM Express + RDMA = AWESOME!

August 11, 2015

Author: Stephen Bates (@stepbates)

Introduction

In previous blog posts I have discussed Project Donard, which implements PCIe peer-to-peer transfers between NVM Express (NVMe) SSDs and GPUs, as wells as NVMe SSDs and Remote Direct Memory Access (RDMA) NICs. I am super-excited to announce that at Flash Memory Summit 2015 (FMS) we have been working with Mellanox, a pioneer of RDMA, to take this work to the next level! This blog post will dig a little deeper into what we are demoing at FMS, August 11-13, and how NVM Express + RDMA = AWESOME!

We have two separate NVMe + RDMA demonstrations in PMC’s booth #213 at FMS. The first shows how we can combine NVMe and RDMA to provide low-latency, high-performance block-based NVM at distance and with scale. The second demonstration integrates Mellanox’s RDMA Peer-Direct with our PMC Flashtec NVRAM card to enable Memory Mapped I/O (MMIO) as an RDMA target to enable persistent memory access at distance and scale. Let’s look at each demo in more detail:

NVM Express over RDMA

The NVMe over RDMA (NoR) demonstration shows the potential of extending the NVMe protocol to work over RDMA. In this demonstration there are two computers, one acting as client and the other as a server connected by RoCEv2 using Mellanox ConnectX-3 Pro NICs. The NVMe device used is the PMC Flashtec™ NVRAM Drive, which has high performance and low latency.  A block diagram of the demonstration is shown below:

NOR Diagram

Our demonstration shows that using RDMA to transfer NVMe commands and data results in minimal additional latency and no throughput drop.

The table below compares the average latency results between a local NVMe device and a remote NVMe device. They show a sub-10us increase in latency with the NoR approach.

Avg Latency Table

The table below compares the throughput results between a local NVMe device and a remote NVMe device. They show that there is no drop in throughput with the NoR approach.

Avg Thoroughput Table

Peer-to-Peer Between RDMA and PCIe Devices

In this demonstration we build on standard RDMA, by adding in the by-pass of the server CPU and DRAM, using Peer-Direct to enable a remote client to connect direct to a server NVRAM / NVMe device. We combine RoCEv2-capable ConnectX-3 Pro RDMA NICs from Mellanox with PMC Flashtec NVRAM Drives and enable Peer-Direct operation between the NIC and the NVRAM. Peer-Direct enables direct access to the NVRAM card from a remote client, which leads to latency reduction, as well as CPU and DRAM offload when compared to a traditional RDMA flow.

RDMA Flow

The hardware for this demonstration consists of two computers, one acting as a client and the other as a server. We use a PCIe switch in the server to improve the Peer-Direct performance above and beyond what can be achieved using the Intel CPU root complex.

The table below compares the background DRAM bandwidth available on the server when both traditional RDMA and Peer-Direct RDMA are used. Results were obtained using perftest:

Background DRAM

The table below compares the average latency results between traditional RDMA and Peer-Direct RDMA. Results were obtained using the RDMA mode of fio:

Avg Latency Table

Peer-Direct Code Base

As previously mentioned, we implemented Donard with open-source paradigms in mind, and it would be remiss of us to not consider opening the Donard code to the community. As such, we have placed the Donard code on GitHub under a mix of Apache 2.0 and GPL 2.0 licensing. Any code we modified that was originally GPL we were required to keep GPL, but all our new code is available under Apache to allow others to take it and use it as they wish.

It is our hope that people in the community will use this code, make further improvements to it, and contribute it back to the code base. The git repository for the code is available here.

In addition, the code associated with this work will be included in the December Open Fabrics Enterprise Distribution (OFED) December release. Stay tuned for more details on that release soon.

Conclusions

Both RDMA and NVMe are technologies that are on the ascent! The first provides low-latency, efficient data movement with distance and scale, the second provides low-latency access to SSDs. Combining these two technologies can lead to awesome things and PMC and Mellanox working together to bring this awesome performance to you!

facebooklinkedinmail





2 thoughts on “Flash Memory Summit 2015 Special – NVM Express + RDMA = AWESOME!




  1. Hi Yong Li

    If you are interested in getting your hands on our NVRAM card you can ping me at stephen dot bates at pmcs dot com. Also check out the brand new review on Tom’s IT Pro.

    When we enable P2P the IOPs can increase but it does depend on things like queue depth and if you use a PCIe switch instead of the CPU Root-Complex. We can chat about this more if you email me.

    Regards

    Stephen

  2. How can we have NVRAM / NVMe device? Is it the NV1604 from PMC website? How to buy one or two? We will do some analysis and experiment as you did.

    Does the IOPS have some improvement when you enable P2P?

Leave a Reply to Stephen Bates Cancel reply

Your email address will not be published. Required fields are marked *


five × 7 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>