Date: Wed, 24 Jun 2009 02:32:24 -0700 From: freebsd@t41t.com To: FreeBSD-Questions@freebsd.org Subject: Re: you're not going to believe this. Message-ID: <20090624093223.GF3468@ece.pdx.edu> In-Reply-To: <20090624010922.GA24335@thought.org> References: <20090622230729.GA20167@thought.org> <a9f4a3860906231222r65faaf1cia6b68186c79f4791@mail.gmail.com> <20090623201041.GA23561@thought.org> <20090623205944.GA43982@Grumpy.DynDNS.org> <20090624010922.GA24335@thought.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Gary Kline: > Http://www.mydigitaldiscount.com/SPD/runcore-64gb-pata-mini-pci-e-pcie-ssd-for-asus-eee-pc-901-and-1000---backorder-runcore-64gb-pata-mini-pci-e-pcie-ssd-for-asus-eee-pc-901-and-1000--800008DB-1224129741.jsp > ... statement that this device lasts ten years before it fails to > hold state. Roland Smith: > The big difference is that it is much easier to tweak and change > algorithms when doing it in software. Wojciech Puchar: > This flash chips have to emulate hard drive, which slows them down > manyfold > ... has acceptable lifetime/reliability, and uses less power/generates > less heat than traditional platter HD ... > [F]or example wear leveling and emulation small blocks requires moving > of data within flash, this lowers both performance and lifetime. I should know better, but I'm going to reply anyway. First, be careful about statements like "10 years before it fails to hold state." Usually that means if you write data to the device and put it on a shelf, you've got 10 years before the data is unreadable. Being marketing figures, these numbers are naturally stretched and inflated. Data retention is strongly dependent on ambient temperature, among other things. More to the point, that's a statistic you probably don't care about, because who's going to buy a $200+ SSD hard drive and then leave it on a shelf for a decade? The number you probably care about is how long _in active use_ the drive will last, and that's probably _not_ 10 years. The primary source of degredation (and eventually, failure) is writes, so minimizing writes will probably extend the drive's life. NAND Flash, as used in SSDs, is typically rated for (order of magnitude...) 10k write cycles. How many writes that gives you, once you put a bunch of chips together into an SSD and do wear leveling and all that, is anyone's guess. (The manufacturer probably knows, but won't tell you.) Current NAND Flash chips do ECC and wear leveling transparently. It is a significant time cost to move a block, so it's usually done when a block is already being erased. This eliminates half the time because you already know half the data trivially (it's being erased), and erase is already a long operation, so making it a little longer is less noticeable. Implementing wear leveling in OS-level software isn't feasible. As I mentioned, wear leveling happens within the chip, so the OS doesn't even know a block swap has occurred. (As an extension of this, the OS doesn't know what the write count is, per block.) The OS doesn't have access to physical parameters of the Flash cells (parameters the chip itself can measure on-the-fly) to know when a swap needs to occur. Depending on implementation, the OS may not even realize when (or how often) an ECC correction occurs. Wear leveling algorithms are anything but trivial, are usually are closely guarded trade secrets, and depend heavily on manufacturing process parameters that are themselves trade secret. (This is not to say a Flash-specific file system doesn't have value... you can probably get a lot just by caching writes as long as possible, and putting commonly-modified pieces of data near each other in the address space so they can be written together when updating is needed.) The SATA bridge does have a non-zero impact on read and write times. However, that impact is nowhere near "manyfold" the inherenet read/write time. In fact, it's pretty close to negligible. Most of the time is eaten up by multi-level cell sensing/placement, ECC correction, and as mentioned above, wear leveling (for writes). The lifetime and reliability of SSDs are less-than-or-equal-to the lifetime and reliability of spinning magnetic drives, so don't buy an SSD for that. Whether SSDs use less power is an open question. There's a lot of data going either way. The last comparison I saw suggested spinning drives average less power than their SSD counterparts. In any event, it's not clear-cut yet. SSDs probably do generate less heat (but I've not seen data on that). Of course, the access time on an SSD is order(s) of magnitude less than for a spinning drive, and that's cause enough for lots of people to buy one. And finally, wear leveling is just a fact of life with Flash. It's not a symptom of emulating a spinning drive or some particular block size. Wear leveling won't go away (and you won't gain back that part of the write time) by inventing a non-SATA, Flash-specific HD interface that nobody supports yet. In fact, Gary's link talks about a device with a PCIe interface, so the whole issue of acting like a spinning drive isn't applicable.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090624093223.GF3468>