Date: Sun, 30 Mar 2008 17:10:21 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Christopher Arnold <chris@arnold.se> Cc: arch@freebsd.org Subject: Re: Flash disks and FFS layout heuristics Message-ID: <200803310010.m2V0ALRp017186@apollo.backplane.com> References: <20080330231544.A96475@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
:I belive phk means that ggogling for "Flash Adaptation Layer" turns up some :results. : :> And no, I really don't want to discuss it any further with you. :> :But please continue the duscussion for the sake of the silent majority, there :are loads of us out here who are interested in flash fs development. : :Also, i had the impression that newer flash based hardrives had internal logig :to spread out writs evenly over the disk and to remap worn out blocks. And that :the result of these algoritms increased MTBF to atleast the MTBF for spinning :disks. Or have i misread something? : : : /Chris I found some of it, though I dunno if it's what he was specifically referencing. The slide show was interesting though there were a number of factual errors, but I didn't really see anything in-depth about 'Flash Adaptation Layer'. It seems to be a fairly generically coined term for something that is far from generic in actual implementation. The idea of remapping flash sectors could be considered a poor-man's way of dealing with wear issues in that remapping tends to be fairly limited... for example, you might use a fixed-sized table and once the table fills up the device is toast. Remapping doesn't actually prevent the uneven wear from occuring, it just gives you a fixed factor of additional runway. If remapping gets complex enough to work with an arbitrary number of dead sectors then it is effectively a 'Flash Adaptation Layer'. Limited remapping (e.g. using a fixed-sized table) is really easy to code up. But there are some huge differences between the two. Really huge differences. Detecting a worn cell requires generating a CRC and correcting it requires generating an ECC code. Neither CRCs nor ECCs are perfect and actually depending on them to handle situations that happen *normally* during the device's life-span is bad business. A proper sector translation mechanism guarantees even wear of all the cells. You don't *GET* CRC errors under normal operation of the device. You still want to have a CRC to detect the situation, and perhaps even a small ECC to try to correct it, but these exist to handle manufacturing defects (which can limit the life of individual cells) rather then to handle wear issues unrelated to manufacturing defects, which is what a limited remapping mechanic does. A wear issue can cause many cells to die (see later on w/ regards to data retention) whereas a manufacturing defect tends to result in single bit errors. Insofar as indestructability, in the short term flash storage is more resilient then disk storage especially considering that there are no moving parts, but flash cells will degrade over time whether you write to them or not, depending on temperature. Look at any flash part, bring up the technical specifications and there will be an entry for 'data retention' time. Usually it's around 10 years at 20 C. If it is hotter the data is retained for a shorter period of time, if it is colder the data is retained for a longer period of time. Retention is different from cell wear. What retention means is that if you have a flash device, you need to rewrite the cells (you can't just read the cell like a dram refresh, but you don't have to go through an erase cycle. You only have to rewrite the cell)... you need to do that at least once every 5 years to be safe, or you risk losing the data. Rewriting the cell does add wear to it so you don't want to rewrite it too often. I have personally seen flash devices lose data... I'm trying to remember how many years it was but I think it was on the order of 15 years in one unit out of 30 that was subject to fairly hot temperatures in the summer. A flash unit must therefore run a scrubber to really be reliable. It is absolutely required if you use a remapping algorithm, and a bit less so if you use a proper storage layer which generates even wear. The real difference between the two comes down to shelf life (when you aren't scrubbing anything), since worn cells will die a lot more quickly then unworn cells. A scrubber in this case must validate the CRC and there is usually a way to tell the device to operate at a different detection threshold in order to detect a failing cell *before* it actually fails (write-verify usually does this when writing but you also want to do this when scrubbing, if you want to do it right). The idea is for the scrubber to detect bit errors *before* the data becomes unrecoverable and, in fact, before the data even needs to be ECC'd. You should not have to actually use ECC correction under normal operation of the device over its entire life span. If you have a wear situation where multiple cells are failing and you do not scan the data in the flash often enough (using write-verify thresholds, NOT normal operations thresholds) to detect the failing cells, and/or you do not have a verification voltage capability to detect failing cells before they fail (for example you take a worn device offline and store it on a shelf somewhere), then you risk detecting the failed cells too late at a point where there are too many failed cells to correct. This is of particular concern for very large flash storage. One side-effect of having a proper storage layer is that the scrubber is typically built in to it. Just the mechanic of write-appending and having to repack the storage usually cycles the storage in a time frame less then 10 years. You can scrub either way, though, it isn't hard to do and doesn't require remapping the cell unless it has failed, just re-writing the same data resets the energy levels. A flash is still more reliable then a hard drive in the short-term. However, disk media tends to retain magnetic orientation longer then a flash cell (longer then 10 years)... well, I'm not sure about the absolute latest technology but that was certainly the case 10 years ago. Disk media has similar thermal erasure issues so, really, both types of media have a limited data retention span. Recovering data from an aging flash chip is a lot harder, though, because you have to remove the flash packaging and even shave the chip (yes, it can be done, there have been numerous cases where supposedly secure execute-only flash and E^2prom could be read out by shaving the chip, though I dunno if it has been done with recent super-high-density flashes). With disk media you can generally recover thermally erased bits using very expensive equipment with very sensitive detectors. If the data is important, and you are willing to pay for it, you can recover it off a HD. Typically the only difference between 'consumer' and 'industrial' flash is how they sort the chips coming out of the plant. It is possible to detect weak cells and sort the chips accordingly (thus consumer chips have fewer rewrite cycles), though frankly in most cases a consumer chip will be almost as good as an industrial one. If you run a proper sector translation layer which generates even wear and you have the ability to use the write-verify mechanism in your scrubbing code, it doesn't really matter which grade you use. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200803310010.m2V0ALRp017186>