NVMe over Fabrics Demystified
Courtesy: https://www.redbooks.ibm.com/redpapers/pdfs/redp5437.pdf

NVMe over Fabrics Demystified

As storage industry moved towards flash, the slowest component of the performance chain , Media, was taken care of. The next slowest component is network and hence past couple of years have shown lots of development on low latency network interconnect, to leverage Media like Flash fully. Its not that the interconnect in itself are slow ( after all we have optical fibre in there), but the protocol which leverage this physical pipe are. Hence the effort on protocol development, to enable low latency interconnection between the requestor and Media.?One that seems to be becoming popular is NVMe protocol. I wrote about NVMe basics and its speed advantages in one of my previous post. It explains why NVMe is faster than existing protocols. As name suggests, it is made grounds up for Non Volatile Memory based Medias, and hence can unleash performance of the existing NAND Flash and upcoming media based on Storage Class Memory.

However, by definition, NVMe itself mapped the host commands and responses to host’s shared memory assuming within host making it a local connection protocol with single host/single device. Obviously, there is a natural demand to expand the protocol across network fabrics, in a bid to allow distributed systems to connect to multiple hosts and allow larger distances between host and the device.

To serve this need, a new protocol standard is developed called NVMe over Fabrics. It retains most of NVMe protocol (about 90%), however the shared memory mapping model gets changed to message based model over different fabrics like Infiniband, Fibre Channel, Fibre Channel over Ethernet, RoCE, iWARP etc. The original one or more NVMe commands get encapsulated in messages over the chosen transport layer. NVMe over Fabrics (NVMeoF) is predominantly set to use RDMA to support the message passing over fabric. The development of NVMeoF is defined by a technical subgroup of the NVM Express organisation.

?FC-NVMe, the NVMe over Fibre Channel Transport is being developed by the INCITS T11 committee, which develops all of the Fibre Channel interface standards. As part of this development work, FC-NVMe is also expected to work with Fibre Channel over Ethernet (FCoE).

As it goes, the promise of NVMeoF is to extend the low latency protocol to access Flash/SSD etc over a distance ( leveraging advanced queuing defined by the protocol). However, in its initial stage there are few roadblocks to watch out for:

1. Its a New standard ( First version published in June 2016). It will keep evolving and there is a possibility that the gaps will be filled by vendors with proprietary implementations which are not interoperable. This is a curve every new standard goes through till it matures and makes everything standardized. 2. It is a Point to Point protocol between initiator and target like iSCSI, and may face same challenges like iSCSI for HA/Scale out implementations. So we need to watch how each vendor implements it to cover these aspects. 3. The standard yet does not cover end to end data integrity( unlike NVMe over PCIe). So it is up to each vendor to implement it and make it coexists with the defined NVMeoF standard.

The simpler protocol layer, also exposes the possibility of user level device driver, and allowing applications to bypass kernel and block IO abstraction completely. Though this can make applications a bit more complex with added IO handling responsibilities, the gains may be wroth for some applications where latency/IOPS/throughput drive high value outcomes.

Since NVMe will enable faster access to host attached drivers, the software defined storage, will sooner than later try n take advantage of this, and will see early adoption here. Other dominant use cases are:

Persistent caching devices on hosts or storages : To expand memory based cache at low cost, and without worrying about data protection over power cycles. One such example is IBM Spectrum Scale, which leverages NVMe for its LROC function significantly improving performance numbers ( SPEC SFS 2014 benchmark)

Fastest Tier in Hybrid Storage: Potentially exposing it to storage clients directly and favour performance trading off fault tolarance. Can be useful in workloads producing high amount of temporary performance critical data ( like AI, machine learning algorithms runs on massive data farms during training runs etc)

Leverage NVMe for backend storage system : the protocol needs to get more matured, additional accelerators need to be in place to support data functions like compression/copy services etc. and will leverage user spaces IO to gain performance. IBM Spectrum Virtualize (V9000) and Spectrum Accelerate ( A9000/R)will most likely follow this path. Accelerate already follows user space IO with RDMA over SCSI( SRP), and will adopt easily to NVMe over fabrics.

Leverage NVMeoF for host attachment : This is complement the currently maturing technologies like iSER. In fact, current RDMA based protocols will contribute to the maturity and evolution of NVMeoF for host attachment. In all probability, Spectrum Virtualize will lead the way for IBM in this adoption.

Details on IBM NVMe adoption can be found here?.

Feel free to send me a message if you want to talk to IBM storage expert to know more about this topic and relevant use cases in your environment.

P.S. With Media and Interconnect taken care of…Where does the ever elusive application performance shift its bottleneck now?

This blog was first published here

Vikash Kumar

Senior Manager - Validation Engineering at Renesas Electronics | Validation/Test/QA Specialist | Power & Performance | AI, ML & Gen AI | Open Source | Vendor Management | PMP | Six Sigma | 22k connections

7 年

That's well presented. Going ahead, NVMe will be widely used for both primary and secondary storage.

回复
Shalaka Verma

Technical Executive Leadership | Quantum Computing | Presales| Startup Advisor

7 年

piyush gupta I agree. Intend to he in Pune Labs at the end of the month. Would like to hear more about this thought process for sure. At some point, there is always covergence on what becomes a data center standard protocol, and I am still not sure which factors will decide it one way or the other

回复
Piyush Gupta

Program Director & Global Product Owner , IBM Storage Software , SaaS Development & DevOps/SRE

7 年

Nice Article Shalaka. Regarding iSER , yes it is fast maturing RDMA based block interconnect already showing lot of interest. iSER certainly will complement it once NVMeoF RDMA is mature and make sure customer's hardware investments are protected.

回复
Shalaka Verma

Technical Executive Leadership | Quantum Computing | Presales| Startup Advisor

7 年

You are welcome Robert. I think we'll thought through adoption has far better chance of success, instead of going ad-hoc

回复

Thanks for sharing this great synthesis, Shalaka Verma - indeed NVMeoF questions many of today's system design assumptions, hence we're taking steps to rethink these - across our line of products.

回复

要查看或添加评论,请登录

Shalaka Verma的更多文章

  • Reading for Soul: The ultimate Gift

    Reading for Soul: The ultimate Gift

    Over the weekend, caught up with “The Ultimate Gift” by Jim Stovall. It’s a very lean book, but I found it be soul…

    6 条评论
  • Driving a holistic AI at Scale Approach and Building AI Supercomputer

    Driving a holistic AI at Scale Approach and Building AI Supercomputer

    Microsoft recently developed single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of…

    2 条评论
  • Why Micro-services, Why Now?

    Why Micro-services, Why Now?

    API based programming model is nothing new for developer community. However in the past, developer community preferred…

    2 条评论
  • Demystifying Storage Security

    Demystifying Storage Security

    Data Beaches impact each of us. It’s not about just loss of asset, but loss of individuality, credibility, privacy…

  • Pace up with NVMe and NVMeoF

    Pace up with NVMe and NVMeoF

    There is a lot of talk about NVMe and NVMeoF currently, especially in the Data / Storage Industry. But why so much…

  • IBM’s new FS9100 : You Name It, You Have It

    IBM’s new FS9100 : You Name It, You Have It

    One of the analysts recently said “What’s there to not like in IBM FS9100? ” True Indeed!! · A new densely packed all…

    2 条评论
  • Is AI turning all data computing to essentially HPC?

    Is AI turning all data computing to essentially HPC?

    I remember my first stint with HPC, in year 2001 when I was working as a scientist in Bhabha Atomic Research Center…

    2 条评论
  • The solution you get is only as good as the problem you posed!!

    The solution you get is only as good as the problem you posed!!

    The point is, I believe the only thing that we should help our clients do, is to help them frame the correct problem…

    1 条评论
  • Can Prayers Heal?

    Can Prayers Heal?

    I have been reading about Noetic Science for some time now. It is basically science which tries to find deeper…

    3 条评论
  • Data in itself is a Cognitive Problem

    Data in itself is a Cognitive Problem

    Ok Folks, so let me put this in perspective. In the times where everyone is talking about the Cognitive Era and all…

    1 条评论

社区洞察

其他会员也浏览了