From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 31 01:36:04 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C428106566B
	for <arch@freebsd.org>; Mon, 31 Mar 2008 01:36:04 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id F41B28FC17
	for <arch@freebsd.org>; Mon, 31 Mar 2008 01:36:03 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m2V1ZqdA018355;
	Sun, 30 Mar 2008 18:35:52 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.1/8.13.4/Submit) id m2V1ZpiN018354;
	Sun, 30 Mar 2008 18:35:51 -0700 (PDT)
Date: Sun, 30 Mar 2008 18:35:51 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200803310135.m2V1ZpiN018354@apollo.backplane.com>
To: Christopher Arnold <chris@arnold.se>, arch@freebsd.org
References: <20080330231544.A96475@localhost> 
Cc: 
Subject: Re: Flash disks and FFS layout heuristics 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 01:36:04 -0000

     I just finished reading up on the latest NAND stuff, so I am going
     to add an addendum.

     There was one factual error in my last posting having to do with
     byte rewrites.  I'm not sure this applies to all manufacturers but
     one spec sheet I looked at specifically limited non-erase rewriting
     to two consecutive page-write sequences.  After that you have to perform
     an erase before you can write (and rewrite once) again.

**** I'd be interested in knowing if any chip vendors support multiple
**** consecutive page-write sequences without erase cycles inbetween
**** (i.e. allowing 1->0 transitions like you can do with NOR).

     It looks like most vendors provide SECTOR_SIZE + 64 bytes of auxillary
     information.  The auxillary information is where you typically store
     the CRC and ECC (they can be the same thing but it's a good idea to
     implement them separately).   I was surprised that the vendors
     speced only a 2 bit detect / 1 bit correction code, which is actually
     the simplest hamming code you can have.

     Describing this type of hamming code in a paragraph is actually pretty
     easy.  You can think of it as a code which identifies which bit in a
     block is in error and needs to be 'flipped' (aka the '1' bit correction).
     For example, if you are ECC'ing 8192 bytes you have 65536 bits
     which means the hamming code needs to be able to encode a 16 bit
     correction address, hence it requires 16 bits of storage for the
     correction, plus another (typically) log2(16) = 4 bits of storage for
     the detection, plus 1 more bit (you have to include the storage
     taken up by the ECC code itself).  So ECC on 65536 bits requires 21 bits.
     I'm doing that from memory so don't quote me, we used those sorts of 
     ECC in radio modem protocols 20 years ago.

     The actual construction of the correction address is a bit more
     complex but that is the basics of how a 2 bit detect / 1 bit correct
     hamming code works.

     The vendor bit error handling recommendation is to relocate the page
     and then erase the original rather then to rewrite the page, so the
     scrubbing code can't just rewrite the same page when it finds an error.
     You still have to scrub, though, or you risk accumulating too many
     errors to correct.  write-verify is typically automatic in the chips
     but the two I checked do not seem to have a variable threshold for
     read operations for early detection of leaking bits.  Older chips had
     separate power supplies for the programming power but newer ones
     incorporate internal charge pumps so it may not be doable, which
     would be too bad.

     Life span and shelf life information is correct.  My assumption there
     is that the manufacturers are specing the shelf life for leakage in the
     worst case write verses verify cycle (the verify is internal to the chip,
     the external entity just does a write and reads the verification status
     after it finishes).  If there is no way to do a read at a lower
     sensitivity level there is really no way to locate failing bits before
     they actually fail.  That doesn't seem right so I may be missing
     something in the spec.

     With regards to averaging out the wear by not erase cycling the same
     page over and over again, my read from the chip specs is that you
     basically have no choice on the matter... you MUST average the wear out,
     period end of story.  This also precludes using a simple sector
     remapping algorithm, particularly if the re-writes between erase
     cycles for a page are limited.

     The reason you MUST average the wear out is that the vendors do not
     appear to be guaranteeing even 100K erase cycles.

     I've read flash chip specs a billion times... when you read between
     the lines what the vendor is saying, basically, is that the shelf life
     of a stored bit is only guaranteed to be 10 years if you don't rewrite
     the cell more then X number of times.  So while it may be possible to
     write more then X number of times, you risk serious data degredation
     ('shelf life') if you do, even if the write does not fail.  This
     is the only guarantee they make, and it is based on the damage the cell
     takes when you erase/write to it which increases leakage which reduces
     shelf life.

     They do NOT guarantee that you can actually do X erase cycles, they
     simply say that the chip will tell you if an erase cycle fails, and that
     it can fail ANY TIME... the very first erase cycle you do on a
     particular page can fail.

     The ONLY thing the vendors guarantee is that the FIRST page on the device
     can go through a certain number of erase cycles, like 1000 or 10,000.
     No other page on the device has any sort of guarantee.

     This is very important.  This means you MUST average the wear out,
     period, whether it is consumer OR industrial grade.

						-Matt