From owner-freebsd-arch@FreeBSD.ORG Mon Mar 31 19:15:41 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09714106566C for ; Mon, 31 Mar 2008 19:15:41 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id D05288FC17 for ; Mon, 31 Mar 2008 19:15:40 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m2VJFSqj027594; Mon, 31 Mar 2008 12:15:28 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.1/8.13.4/Submit) id m2VJFSoR027593; Mon, 31 Mar 2008 12:15:28 -0700 (PDT) Date: Mon, 31 Mar 2008 12:15:28 -0700 (PDT) From: Matthew Dillon Message-Id: <200803311915.m2VJFSoR027593@apollo.backplane.com> To: qpadla@gmail.com References: <20080330231544.A96475@localhost> <200803310135.m2V1ZpiN018354@apollo.backplane.com> <200803312125.29325.qpadla@gmail.com> Cc: Christopher Arnold , arch@freebsd.org, Martin Fouts , freebsd-arch@freebsd.org Subject: Re: Flash disks and FFS layout heuristics X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 19:15:41 -0000 This is all very good information. I was unaware of the adjacent write effect, but it makes sense considering the cell size. Hard drives have a similar effect (it's one of the limiting factors for density). Hamming codes (ECC codes) are very fragile beasts. While they are in the same family as a CRC it is a really bad idea to try to use the ECC code as your CRC which is why I recommended against it in my previous posting. A two-bit-detect/one-bit-correction code is utterly trivial to generate (both generating it and using it)... I've done such codes in 8-bit cpu's. Their fragility can be surprising to anyone who has never worked with them. I've written numerous filesystems, including a NOR flash filesystem (whos characteristics are somewhat different due to the availability of byte-write). In my opinion, designing a filesystem *specifically* for NAND flash is a mistake because the technology is rapidly evolving and such a filesystem would wind up being obsolete in fairly short order. For example, the simple addition of some front-end non-volatile cache, such as a dime-cap-backed static ram, would have a very serious effect on any such filesystem design. It is far far better to design the filesystem around generally desired characteristics, such as good write locality of reference (though, again, indices still have to be updated and those usually do not have good locality of reference). DragonFly's HAMMER has pretty good write-locality of reference but still does random updates for B-Tree indices and things like the mtime and atime fields. It also uses numerous blockmaps that could make direct use of a flash sector-mapping translation layer (1). It might be adaptable. (1) A flash sector-mapping translation layer gives a filesystem the ability to use 'named block numbers'. For example, the NOR filesystem I did used 32 bit named block numbers regardless of the size of the flash (which was typically only 2MB). The filesystem topology was actually encoded into the block number it self. In other words, the filesystem is not bound to a linear range of block numbers it is simply bound What does this mean? This means that what you really want to do is not necessarily write a filesystem that is explicitly designed for NAND operation, but instead write a filesystem that is explicitly designed to run on top of an abstracted topology (such as one where you can have named block numbers), and which generally has the desired features for locality of reference. Such a filesystem would not become obsolete anywhere near as quickly as a nand-specific filesystem would and rebuilding an abstracted topology (whos underlying code would become obsolete as the technology changes) is a whole lot easier then redesigning a filesystem. I am quite partial to the named-block concept, I really think it's the best way to go for flash filesystem design. The flash already has to have a sector-translation mechanism, making the jump to a full blown named-block model is only a small additional step. -Matt