From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 31 18:05:25 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00FDC1065679
	for <arch@freebsd.org>; Mon, 31 Mar 2008 18:05:25 +0000 (UTC)
	(envelope-from mfouts@danger.com)
Received: from mx.danger.com (wall.danger.com [216.220.212.140])
	by mx1.freebsd.org (Postfix) with ESMTP id E09298FC14
	for <arch@freebsd.org>; Mon, 31 Mar 2008 18:05:24 +0000 (UTC)
	(envelope-from mfouts@danger.com)
Received: from danger.com (exchange3.danger.com [10.0.1.7])
	by mx.danger.com (Postfix) with ESMTP id 8A94940A2A6;
	Mon, 31 Mar 2008 10:36:01 -0700 (PDT)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 31 Mar 2008 10:36:09 -0700
Message-ID: <B95CEC1093787C4DB3655EF330984818051D03@EXCHANGE.danger.com>
In-Reply-To: <200803310135.m2V1ZpiN018354@apollo.backplane.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Flash disks and FFS layout heuristics 
Thread-Index: AciSz5+GfnfZSxuDTqmryEuFc5lwBgAgsVwg
References: <20080330231544.A96475@localhost>
	<200803310135.m2V1ZpiN018354@apollo.backplane.com>
From: "Martin Fouts" <mfouts@danger.com>
To: "Matthew Dillon" <dillon@apollo.backplane.com>,
	"Christopher Arnold" <chris@arnold.se>, <arch@freebsd.org>
Cc: 
Subject: RE: Flash disks and FFS layout heuristics 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 18:05:25 -0000

I came late to this discussion, so pardon me if I'm repeating stuff
that's already been discussed.

You can guess a lot from vendor specs, but NAND flash requires
experience before you understand the nuances; especially since the
vendors tend not to document most of what you need to know to get good
performance and reliability from a flash device.

There are, basically, two approaches to using NAND devices. What PHK
calls "flash adapation layer" or, sometimes, "flash translation layer"
is widely used in devices that are meant to be seen as removable ms-dos
file system devices, such as almost every USB NAND based flash device on
the market. It is also used in at least two commercial flash file
systems intended for embedded flash. It is also an approach available to
the Linux MTD layer, although not used by any of the Linux filesystems.
This approach works well enough for specific usage patterns and you will
find several successful embedded devices on the CE market place that use
it.

The second approach is to have a 'flash aware filesystem', which
understand the write/read/erase properties of NAND flash parts. There
are three variants on this approach that I'm aware of. The first takes a
'traditional' filesystem like FFS and, in effect, adds a flash
translation layer.  The second takes a log-like file system and adapts
its GC to NAND. The third approach is to write a file system specific to
NAND devices from scratch. PalmOS Garnet's NAND file system is an
example of the first. The modified version of LFS that Mike Chen and I
did for PalmOS Cobalt is an example of the second. The MTD based file
system jffs2 is an example of the third, and a cautionary tale for those
who would write their own.

In addition to the various points Matt Dillon has figured out from
reading specs, there are several features of NAND parts that I haven't
seen mentioned here that play a fairly important role in designing file
systems around them. These include, but are probably not limited to:

1) Large page versus small page NAND
2) Broken or poorly performing hardware, especially ECC generation and
write verification
3) Adjacent write effect

Some interesting properties to take into account when designing a NAND
file system:

1) No block can be assumed good, which means you have to scan the device
to find your metadata starting point at boot time.

2) Small page NAND has less 'spare' available in the spare region than
large page NAND, which means that you can do optimizations for large
page nand that you can't for small.

3) write-back caching of writes makes NAND parts less reliable