From owner-freebsd-arch@FreeBSD.ORG Mon Mar 31 18:05:25 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00FDC1065679 for ; Mon, 31 Mar 2008 18:05:25 +0000 (UTC) (envelope-from mfouts@danger.com) Received: from mx.danger.com (wall.danger.com [216.220.212.140]) by mx1.freebsd.org (Postfix) with ESMTP id E09298FC14 for ; Mon, 31 Mar 2008 18:05:24 +0000 (UTC) (envelope-from mfouts@danger.com) Received: from danger.com (exchange3.danger.com [10.0.1.7]) by mx.danger.com (Postfix) with ESMTP id 8A94940A2A6; Mon, 31 Mar 2008 10:36:01 -0700 (PDT) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Mon, 31 Mar 2008 10:36:09 -0700 Message-ID: In-Reply-To: <200803310135.m2V1ZpiN018354@apollo.backplane.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Flash disks and FFS layout heuristics Thread-Index: AciSz5+GfnfZSxuDTqmryEuFc5lwBgAgsVwg References: <20080330231544.A96475@localhost> <200803310135.m2V1ZpiN018354@apollo.backplane.com> From: "Martin Fouts" To: "Matthew Dillon" , "Christopher Arnold" , Cc: Subject: RE: Flash disks and FFS layout heuristics X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 18:05:25 -0000 I came late to this discussion, so pardon me if I'm repeating stuff that's already been discussed. You can guess a lot from vendor specs, but NAND flash requires experience before you understand the nuances; especially since the vendors tend not to document most of what you need to know to get good performance and reliability from a flash device. There are, basically, two approaches to using NAND devices. What PHK calls "flash adapation layer" or, sometimes, "flash translation layer" is widely used in devices that are meant to be seen as removable ms-dos file system devices, such as almost every USB NAND based flash device on the market. It is also used in at least two commercial flash file systems intended for embedded flash. It is also an approach available to the Linux MTD layer, although not used by any of the Linux filesystems. This approach works well enough for specific usage patterns and you will find several successful embedded devices on the CE market place that use it. The second approach is to have a 'flash aware filesystem', which understand the write/read/erase properties of NAND flash parts. There are three variants on this approach that I'm aware of. The first takes a 'traditional' filesystem like FFS and, in effect, adds a flash translation layer. The second takes a log-like file system and adapts its GC to NAND. The third approach is to write a file system specific to NAND devices from scratch. PalmOS Garnet's NAND file system is an example of the first. The modified version of LFS that Mike Chen and I did for PalmOS Cobalt is an example of the second. The MTD based file system jffs2 is an example of the third, and a cautionary tale for those who would write their own. In addition to the various points Matt Dillon has figured out from reading specs, there are several features of NAND parts that I haven't seen mentioned here that play a fairly important role in designing file systems around them. These include, but are probably not limited to: 1) Large page versus small page NAND 2) Broken or poorly performing hardware, especially ECC generation and write verification 3) Adjacent write effect Some interesting properties to take into account when designing a NAND file system: 1) No block can be assumed good, which means you have to scan the device to find your metadata starting point at boot time. 2) Small page NAND has less 'spare' available in the spare region than large page NAND, which means that you can do optimizations for large page nand that you can't for small. 3) write-back caching of writes makes NAND parts less reliable