From owner-freebsd-arch@FreeBSD.ORG Wed Jan 20 01:46:58 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 021151065692 for ; Wed, 20 Jan 2010 01:46:58 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-ew0-f226.google.com (mail-ew0-f226.google.com [209.85.219.226]) by mx1.freebsd.org (Postfix) with ESMTP id 9954B8FC12 for ; Wed, 20 Jan 2010 01:46:57 +0000 (UTC) Received: by ewy26 with SMTP id 26so1735021ewy.3 for ; Tue, 19 Jan 2010 17:46:56 -0800 (PST) Received: by 10.213.100.203 with SMTP id z11mr6693912ebn.51.1263950374389; Tue, 19 Jan 2010 17:19:34 -0800 (PST) Received: from ?10.0.1.198? (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id 14sm4662639ewy.7.2010.01.19.17.19.31 (version=SSLv3 cipher=RC4-MD5); Tue, 19 Jan 2010 17:19:33 -0800 (PST) Date: Tue, 19 Jan 2010 15:23:17 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: arch@freebsd.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: Softdep journaling X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jan 2010 01:46:58 -0000 Hello, Many of you may have already noticed that I have implemented a journaling layer that co-exists with softdep to eliminate fsck after an unclean shutdown. I have written about this here: http://jeffr-tech.livejournal.com/ And I have a patch against current here: http://people.freebsd.org/~jeff/suj.diff I have been working with McKusick and he has been providing review feedback. Tegge and kib have been reviewing my rename changes. Peter Holm has generously provided his time for testing. I am within a week of being able to commit this to CURRENT. I'm raising this here so people can discuss the project and I can answer any questions or concerns before it goes in the tree. Briefly, I have added an intent log to softdep that journals block allocation and free along with inode link count changes. After an unclean shutdown a special fsck pass reads this journal and frees blocks and inodes. The recovery pass is not like traditional block journaling as it actually evaluates the filesystem state to determine how far along the operation made it and rolls back intelligently. The worst case journal recovery time I've seen is a couple of minutes, however, I'm still generating a few hundred megabytes of text describing the operation when I run fsck so that I can quickly resolve any bugs. This worst case performance was generated using pho's stress2 and a completely full 64MB journal containing nearly 2 million outstanding records. Recovery time for a crash during buildworld, for example, is on the order of 10 seconds even while producing the text log. Without the log I expect the maximum on any drive to be around 2 minutes. Presently recovery is actually cpu bound and I'm using 3 year old hardware. It scales up with the size of the journal and down with the speed of the processor. The size of the filesystem makes little difference. The filesystem can not be mounted read/write until the journal is recovered or a full fsck pass is run. The filesystem will be backwards compatible with earlier ffs implementations. The journal can be enabled or disable with tunefs. The only requirement is sufficient free space for the journal which is stored in a regular inode. The patch I have presented is mostly complete. It only lacks the recovery operation for partial truncation. I'm still running through various scenarios to validate the checker, however, the kernel has been very stable as of late. Please raise any comments or concerns here. I'm going to make another call for testers on current@ and want to keep that reserved for bug reports. Thanks, Jeff