Author: Stephen Bates
I’m sure many of you reading this blog are aware there is a transition occurring in terms of the type of Error Correction Codes (ECCs) being used inside SSD controller chips. Traditionally Bose-Chaudhuri-Hocquenghem (BCH) were used, and they were more than adequate for large geometry NAND flash. However, the demand for cheaper and denser NAND flash means that BCH is no longer adequate and, in the search for alternatives, most of us are settling on Low Density Parity Check (LDPC) codes.
In this post, I want to talk a little about what this transition means and some implications it has for something we at PMC term Software Defined Flash. For more background on what an LDPC code is, check out Kent Smith’s great post.
There are several reasons why we’re transitioning from BCH to LDPC, but they can all be boiled down to this: LDPC codes allow you to correct more errors for the same ratio of user data to ECC parity. The second part of this last sentence is really important. We don’t want to increase the number of ECC parity bits in SSDs, because this leads to all sorts of nasty things like Write Amplification (WA) and format inefficiencies.
Well you might say, “Why didn’t we just use LDPC codes right from the start if they’re so good?” That’s a reasonable question, and there are several answers:
- Although LDPC codes were first proposed by Robert Gallagher in the 1960s, their true power was not realized until the 1990s, after NAND flash was already being deployed with BCH codes.
- The circuits that decode LDPC codes tend to be larger and consume more power than their equivalents for BCH codes.
- LDPC codes only really shine when you can extract something called soft information from the NAND flash and this has only become viable in the latest generations of the technology.
However, today there are no more excuses, so we’re seeing many SSD controllers coming to market with integrated LDPC codes. This is allowing us to think about SSDs in some new ways.
LDPC for Endurance and/or Retention
One very obvious benefit of the transition from BCH codes to LDPC codes is that it allows the controller to extend the life of the SSD. NAND flash wears out over Program-Erase (PE) cycles. For example, moving to LDPC codes might allow us to take the flash from 10,000 to 15,000 PE cycles. In that case, we can implement an SSD with a 50% improvement in endurance with no change in NAND flash! In a similar fashion the transition might allow us to improve the retention rating of an SSD.
LDPC for Capacity
A slightly less obvious benefit of the transition from BCH codes to LDPC codes is that it allows us to increase the number of errors on a flash page. “Why on earth would you want to increase the number of errors on a flash page?” Where it gets interesting is if you can increase the number of pages on the NAND flash in exchange for accepting more errors per NAND flash page. This is exactly what happens when you make the shift from MLC NAND flash to TLC NAND flash.
TLC NAND flash has 50% more pages than MLC NAND flash and 150% more pages than SLC NAND flash and can therefore offer significant savings in terms of $/GB. Of course you pay a price in terms of errors per page, but for certain applications that might be acceptable.
The transition from BCH codes to LDPC codes is enabling the NAND flash TLC market and helping to drive the $/GB for NAND flash even lower.
LDPC for Latency
If there’s one thing enterprise and data center customers care about in their SSDs, it’s latency. In some applications latency, and consistency of latency, is paramount. I’ll talk about this issue in more detail in a later blog post. But for now, I wanted to mention that using LDPC codes, especially rate-adaptive LDPC codes, allows you to control that latency and even put some bounds on the variability of the latency.
LDPC for Software Defined Flash
Interestingly the LDPC IP for all the above scenarios, if designed carefully, can be the same. By tuning the firmware controlling the LDPC IP, we can target the controller for a range of applications. This is useful for Software Defined Flash (SDF).
SDF allows you to alter the properties of an SSD just by altering the behavior of the firmware and software that runs on/on top of the SSD itself. This allows the same physical SSD to operate with very different attributes, which is very interesting from both a static and a dynamic point of view:
- Static Configuration – Let’s assume someone wants to deploy a lot of SSDs in their data centers. Now suppose they want some of these SSDs to have high endurance, but they want others to have lower $/GB. An SSD that supports SDF could service both of these SKUs in a single drive by placing different firmware and software on the SSDs.
- Dynamic Configuration – Let’s assume someone wants to deploy a lot of SSDs in their data centers. Now suppose they have a diurnal workload pattern whereby during the day they need low latency but at night they need low power consumption. Again, an SSD that supports SDF could adapt its behavior throughout the day based on instructions from a software management layer that sits above the SSD.
As you can see, the migration from BCH codes to LDPC codes is going to enable a lot of cool things inside SSDs. Some of them are pretty obvious (better endurance), some are a bit more obtuse (latency control) and others are ambitious (SSDs whose properties change over the course of the day).
It’s going to be interesting to see how all of this plays out as the technology becomes pervasive. Software Defined Flash and its role in how SSDs are deployed in Enterprise and data center environments will be sparking conversations for a long time to come.
What do you think is the most exciting development this migration will bring?
Read Part 2 of the LDPC series: Latency in LDPC-based Next-Generation SSD Controllers
Read Part 3 of the LDPC series: Soft-Decoding in LDPC based Next-Generation SSD Controllers