From owner-freebsd-arch@FreeBSD.ORG Mon Mar 31 01:36:04 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C428106566B for ; Mon, 31 Mar 2008 01:36:04 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id F41B28FC17 for ; Mon, 31 Mar 2008 01:36:03 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m2V1ZqdA018355; Sun, 30 Mar 2008 18:35:52 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.1/8.13.4/Submit) id m2V1ZpiN018354; Sun, 30 Mar 2008 18:35:51 -0700 (PDT) Date: Sun, 30 Mar 2008 18:35:51 -0700 (PDT) From: Matthew Dillon Message-Id: <200803310135.m2V1ZpiN018354@apollo.backplane.com> To: Christopher Arnold , arch@freebsd.org References: <20080330231544.A96475@localhost> Cc: Subject: Re: Flash disks and FFS layout heuristics X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 01:36:04 -0000 I just finished reading up on the latest NAND stuff, so I am going to add an addendum. There was one factual error in my last posting having to do with byte rewrites. I'm not sure this applies to all manufacturers but one spec sheet I looked at specifically limited non-erase rewriting to two consecutive page-write sequences. After that you have to perform an erase before you can write (and rewrite once) again. **** I'd be interested in knowing if any chip vendors support multiple **** consecutive page-write sequences without erase cycles inbetween **** (i.e. allowing 1->0 transitions like you can do with NOR). It looks like most vendors provide SECTOR_SIZE + 64 bytes of auxillary information. The auxillary information is where you typically store the CRC and ECC (they can be the same thing but it's a good idea to implement them separately). I was surprised that the vendors speced only a 2 bit detect / 1 bit correction code, which is actually the simplest hamming code you can have. Describing this type of hamming code in a paragraph is actually pretty easy. You can think of it as a code which identifies which bit in a block is in error and needs to be 'flipped' (aka the '1' bit correction). For example, if you are ECC'ing 8192 bytes you have 65536 bits which means the hamming code needs to be able to encode a 16 bit correction address, hence it requires 16 bits of storage for the correction, plus another (typically) log2(16) = 4 bits of storage for the detection, plus 1 more bit (you have to include the storage taken up by the ECC code itself). So ECC on 65536 bits requires 21 bits. I'm doing that from memory so don't quote me, we used those sorts of ECC in radio modem protocols 20 years ago. The actual construction of the correction address is a bit more complex but that is the basics of how a 2 bit detect / 1 bit correct hamming code works. The vendor bit error handling recommendation is to relocate the page and then erase the original rather then to rewrite the page, so the scrubbing code can't just rewrite the same page when it finds an error. You still have to scrub, though, or you risk accumulating too many errors to correct. write-verify is typically automatic in the chips but the two I checked do not seem to have a variable threshold for read operations for early detection of leaking bits. Older chips had separate power supplies for the programming power but newer ones incorporate internal charge pumps so it may not be doable, which would be too bad. Life span and shelf life information is correct. My assumption there is that the manufacturers are specing the shelf life for leakage in the worst case write verses verify cycle (the verify is internal to the chip, the external entity just does a write and reads the verification status after it finishes). If there is no way to do a read at a lower sensitivity level there is really no way to locate failing bits before they actually fail. That doesn't seem right so I may be missing something in the spec. With regards to averaging out the wear by not erase cycling the same page over and over again, my read from the chip specs is that you basically have no choice on the matter... you MUST average the wear out, period end of story. This also precludes using a simple sector remapping algorithm, particularly if the re-writes between erase cycles for a page are limited. The reason you MUST average the wear out is that the vendors do not appear to be guaranteeing even 100K erase cycles. I've read flash chip specs a billion times... when you read between the lines what the vendor is saying, basically, is that the shelf life of a stored bit is only guaranteed to be 10 years if you don't rewrite the cell more then X number of times. So while it may be possible to write more then X number of times, you risk serious data degredation ('shelf life') if you do, even if the write does not fail. This is the only guarantee they make, and it is based on the damage the cell takes when you erase/write to it which increases leakage which reduces shelf life. They do NOT guarantee that you can actually do X erase cycles, they simply say that the chip will tell you if an erase cycle fails, and that it can fail ANY TIME... the very first erase cycle you do on a particular page can fail. The ONLY thing the vendors guarantee is that the FIRST page on the device can go through a certain number of erase cycles, like 1000 or 10,000. No other page on the device has any sort of guarantee. This is very important. This means you MUST average the wear out, period, whether it is consumer OR industrial grade. -Matt