From owner-freebsd-arch@FreeBSD.ORG Tue Apr 1 01:03:34 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29EAF106564A for ; Tue, 1 Apr 2008 01:03:34 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id DAF938FC12 for ; Tue, 1 Apr 2008 01:03:33 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m31159Zv007875; Mon, 31 Mar 2008 21:05:09 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m31158QH007874; Mon, 31 Mar 2008 21:05:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Mon, 31 Mar 2008 21:05:08 -0400 From: David Schultz To: Poul-Henning Kamp Message-ID: <20080401010508.GA7708@zim.MIT.EDU> Mail-Followup-To: Poul-Henning Kamp , Bakul Shah , Christopher Arnold , Martin Fouts , arch@FreeBSD.ORG, qpadla@gmail.com, freebsd-arch@FreeBSD.ORG References: <20080331222154.C976C5B50@mail.bitblocks.com> <26080.1207002217@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <26080.1207002217@critter.freebsd.dk> Cc: Christopher Arnold , Martin Fouts , arch@FreeBSD.ORG, qpadla@gmail.com, freebsd-arch@FreeBSD.ORG Subject: Re: Flash disks and FFS layout heuristics X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 01:03:34 -0000 On Mon, Mar 31, 2008, Poul-Henning Kamp wrote: > In message <20080331222154.C976C5B50@mail.bitblocks.com>, Bakul Shah writes: > >On Mon, 31 Mar 2008 13:06:10 PDT Matthew Dillon wrote: > >> But how do you index that information? You can't simply append the > >> information to the NAND unless you also have a way to access it. So > >> does the filesystem have to scan the NAND (or significant portions of it) > >> in order to build an index of the filesystem topology in system memory? > > > >One possible way: > > > >I'd design the system so that each update ends with the write > >of a root block[1]. This is exactly what ZFS does (except that it wasn't designed for flash, so the primary copy of the root block is always stored at a well-known location.) Countless other systems dating back to the use of shadow paging in System R use the same technique, including WAFL and several flash file systems. > This is sort of the approach Margo Seltzer used for her (Kludge-)LFS > it has many drawbacks, in particular when it comes to recovery. Generally not. Recovery is trivial, especially compared to other techniques such as journalling. You simply find the root block, and it has pointers to a consistent snapshot of the system. The main limitation is that making updates durable immediately (i.e., fsync()) is inefficient, since all the dirty indirect blocks up to the root need to be flushed to disk. ZFS addresses this by writing updates that must be synchronous to a logical redo log, which does introduce complications for recovery.