From owner-freebsd-arch@FreeBSD.ORG  Tue Apr  1 01:03:34 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29EAF106564A
	for <arch@FreeBSD.ORG>; Tue,  1 Apr 2008 01:03:34 +0000 (UTC)
	(envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101])
	by mx1.freebsd.org (Postfix) with ESMTP id DAF938FC12
	for <arch@FreeBSD.ORG>; Tue,  1 Apr 2008 01:03:33 +0000 (UTC)
	(envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
	by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m31159Zv007875;
	Mon, 31 Mar 2008 21:05:09 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m31158QH007874;
	Mon, 31 Mar 2008 21:05:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Date: Mon, 31 Mar 2008 21:05:08 -0400
From: David Schultz <das@FreeBSD.ORG>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20080401010508.GA7708@zim.MIT.EDU>
Mail-Followup-To: Poul-Henning Kamp <phk@phk.freebsd.dk>,
	Bakul Shah <bakul@bitblocks.com>,
	Christopher Arnold <chris@arnold.se>,
	Martin Fouts <mfouts@danger.com>, arch@FreeBSD.ORG,
	qpadla@gmail.com, freebsd-arch@FreeBSD.ORG
References: <20080331222154.C976C5B50@mail.bitblocks.com>
	<26080.1207002217@critter.freebsd.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <26080.1207002217@critter.freebsd.dk>
Cc: Christopher Arnold <chris@arnold.se>, Martin Fouts <mfouts@danger.com>,
	arch@FreeBSD.ORG, qpadla@gmail.com, freebsd-arch@FreeBSD.ORG
Subject: Re: Flash disks and FFS layout heuristics
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 01:03:34 -0000

On Mon, Mar 31, 2008, Poul-Henning Kamp wrote:
> In message <20080331222154.C976C5B50@mail.bitblocks.com>, Bakul Shah writes:
> >On Mon, 31 Mar 2008 13:06:10 PDT Matthew Dillon <dillon@apollo.backplane.com>  wrote:
> >>     But how do you index that information?  You can't simply append the
> >>     information to the NAND unless you also have a way to access it.  So
> >>     does the filesystem have to scan the NAND (or significant portions of it)
> >>     in order to build an index of the filesystem topology in system memory?
> >
> >One possible way:
> >
> >I'd design the system so that each update ends with the write
> >of a root block[1]. 

This is exactly what ZFS does (except that it wasn't designed for
flash, so the primary copy of the root block is always stored at a
well-known location.) Countless other systems dating back to the
use of shadow paging in System R use the same technique, including
WAFL and several flash file systems.

> This is sort of the approach Margo Seltzer used for her (Kludge-)LFS
> it has many drawbacks, in particular when it comes to recovery.

Generally not. Recovery is trivial, especially compared to other
techniques such as journalling. You simply find the root block,
and it has pointers to a consistent snapshot of the system.  The
main limitation is that making updates durable immediately (i.e.,
fsync()) is inefficient, since all the dirty indirect blocks up to
the root need to be flushed to disk. ZFS addresses this by writing
updates that must be synchronous to a logical redo log, which does
introduce complications for recovery.