From owner-freebsd-arch@FreeBSD.ORG  Mon Mar 31 02:13:36 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 908E41065674
	for <arch@FreeBSD.ORG>; Mon, 31 Mar 2008 02:13:36 +0000 (UTC)
	(envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101])
	by mx1.freebsd.org (Postfix) with ESMTP id 252798FC16
	for <arch@FreeBSD.ORG>; Mon, 31 Mar 2008 02:13:35 +0000 (UTC)
	(envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
	by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m2V2F4m4001804;
	Sun, 30 Mar 2008 22:15:04 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m2V2F4ju001803;
	Sun, 30 Mar 2008 22:15:04 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Date: Sun, 30 Mar 2008 22:15:04 -0400
From: David Schultz <das@FreeBSD.ORG>
To: Matthew Dillon <dillon@apollo.backplane.com>
Message-ID: <20080331021504.GA1465@zim.MIT.EDU>
Mail-Followup-To: Matthew Dillon <dillon@apollo.backplane.com>,
	Christopher Arnold <chris@arnold.se>, arch@FreeBSD.ORG
References: <20080330231544.A96475@localhost>
	<200803310010.m2V0ALRp017186@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803310010.m2V0ALRp017186@apollo.backplane.com>
Cc: Christopher Arnold <chris@arnold.se>, arch@FreeBSD.ORG
Subject: Re: Flash disks and FFS layout heuristics
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 02:13:36 -0000

On Sun, Mar 30, 2008, Matthew Dillon wrote:
>     The idea of remapping flash sectors could be considered a poor-man's
>     way of dealing with wear issues in that remapping tends to be fairly
>     limited... for example, you might use a fixed-sized table and once the
>     table fills up the device is toast.  Remapping doesn't actually prevent
>     the uneven wear from occuring, it just gives you a fixed factor of
>     additional runway.
[...]
>     A flash unit must therefore run a scrubber to really be reliable.  It is
>     absolutely required if you use a remapping algorithm, and a bit less so
>     if you use a proper storage layer which generates even wear.

Yes, this is essentially what modern NAND flash devices do. I
suggest that you read this article before you write any more
essays about it:

       http://www.cs.tau.ac.il/~stoledo/Pubs/flash-survey.pdf

Now if you think about issues such as sector mapping updates,
writes smaller than the mapping granularity, and running the
cleaner on fragmented erase units, you'll quickly see why random
writes perform so poorly.

You're right that you need additional algorithms to avoid uneven
wear; remapping merely facilitates that even when the write access
pattern is decidedly uneven. The article discusses several approaches.

Several people have proposed flash-aware filesystems, also
described in the article, to obviate the need for this sort of
remapping layer. Confusingly, one of them is called FFS, for
"Flash File System". Most of them resemble log-structured
filesystems like LFS and ZFS, but often with additional
considerations such as wear leveling.

Your earlier characterization of ZFS wasn't quite right, by the
way; ZFS arranges data and metadata in a tree of blocks, and even
the indirect blocks, except for the top-level block, are
copy-on-write. Unfortunately I can't find a good paper on it at
the moment.