From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 31 19:15:41 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 09714106566C
	for <arch@freebsd.org>; Mon, 31 Mar 2008 19:15:41 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id D05288FC17
	for <arch@freebsd.org>; Mon, 31 Mar 2008 19:15:40 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m2VJFSqj027594;
	Mon, 31 Mar 2008 12:15:28 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.1/8.13.4/Submit) id m2VJFSoR027593;
	Mon, 31 Mar 2008 12:15:28 -0700 (PDT)
Date: Mon, 31 Mar 2008 12:15:28 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200803311915.m2VJFSoR027593@apollo.backplane.com>
To: qpadla@gmail.com
References: <20080330231544.A96475@localhost>
	<200803310135.m2V1ZpiN018354@apollo.backplane.com>
	<B95CEC1093787C4DB3655EF330984818051D03@EXCHANGE.danger.com>
	<200803312125.29325.qpadla@gmail.com>
Cc: Christopher Arnold <chris@arnold.se>, arch@freebsd.org,
	Martin Fouts <mfouts@danger.com>, freebsd-arch@freebsd.org
Subject: Re: Flash disks and FFS layout heuristics
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 19:15:41 -0000

    This is all very good information.  I was unaware of the adjacent write
    effect, but it makes sense considering the cell size.  Hard drives have
    a similar effect (it's one of the limiting factors for density).

    Hamming codes (ECC codes) are very fragile beasts.  While they are in the
    same family as a CRC it is a really bad idea to try to use the ECC code
    as your CRC which is why I recommended against it in my previous posting.
    A two-bit-detect/one-bit-correction code is utterly trivial to generate
    (both generating it and using it)... I've done such codes in 8-bit cpu's.
    Their fragility can be surprising to anyone who has never worked with
    them.

    I've written numerous filesystems, including a NOR flash filesystem
    (whos characteristics are somewhat different due to the availability of
    byte-write).  In my opinion, designing a filesystem *specifically* for
    NAND flash is a mistake because the technology is rapidly evolving and
    such a filesystem would wind up being obsolete in fairly short order.
    For example, the simple addition of some front-end non-volatile cache,
    such as a dime-cap-backed static ram, would have a very serious effect
    on any such filesystem design.  It is far far better to design the
    filesystem around generally desired characteristics, such as good
    write locality of reference (though, again, indices still have to be
    updated and those usually do not have good locality of reference).

    DragonFly's HAMMER has pretty good write-locality of reference but still
    does random updates for B-Tree indices and things like the mtime and 
    atime fields.  It also uses numerous blockmaps that could make direct use
    of a flash sector-mapping translation layer (1).  It might be adaptable.

    (1) A flash sector-mapping translation layer gives a filesystem the
    ability to use 'named block numbers'.  For example, the NOR filesystem
    I did used 32 bit named block numbers regardless of the size of the
    flash (which was typically only 2MB).  The filesystem topology was
    actually encoded into the block number it self.  In other words, the
    filesystem is not bound to a linear range of block numbers it is
    simply bound

    What does this mean?  This means that what you really want to do is not
    necessarily write a filesystem that is explicitly designed for NAND
    operation, but instead write a filesystem that is explicitly designed
    to run on top of an abstracted topology (such as one where you can have
    named block numbers), and which generally has the desired features for
    locality of reference.  Such a filesystem would not become obsolete
    anywhere near as quickly as a nand-specific filesystem would and 
    rebuilding an abstracted topology (whos underlying code would become
    obsolete as the technology changes) is a whole lot easier then
    redesigning a filesystem.

    I am quite partial to the named-block concept, I really think it's the
    best way to go for flash filesystem design.  The flash already has to
    have a sector-translation mechanism, making the jump to a full blown
    named-block model is only a small additional step.

						-Matt