Date: Tue, 19 Jan 2010 15:23:17 -1000 (HST) From: Jeff Roberson <jroberson@jroberson.net> To: arch@freebsd.org Subject: Softdep journaling Message-ID: <alpine.BSF.2.00.1001191510070.1027@desktop>
next in thread | raw e-mail | index | archive | help
Hello, Many of you may have already noticed that I have implemented a journaling layer that co-exists with softdep to eliminate fsck after an unclean shutdown. I have written about this here: http://jeffr-tech.livejournal.com/ And I have a patch against current here: http://people.freebsd.org/~jeff/suj.diff I have been working with McKusick and he has been providing review feedback. Tegge and kib have been reviewing my rename changes. Peter Holm has generously provided his time for testing. I am within a week of being able to commit this to CURRENT. I'm raising this here so people can discuss the project and I can answer any questions or concerns before it goes in the tree. Briefly, I have added an intent log to softdep that journals block allocation and free along with inode link count changes. After an unclean shutdown a special fsck pass reads this journal and frees blocks and inodes. The recovery pass is not like traditional block journaling as it actually evaluates the filesystem state to determine how far along the operation made it and rolls back intelligently. The worst case journal recovery time I've seen is a couple of minutes, however, I'm still generating a few hundred megabytes of text describing the operation when I run fsck so that I can quickly resolve any bugs. This worst case performance was generated using pho's stress2 and a completely full 64MB journal containing nearly 2 million outstanding records. Recovery time for a crash during buildworld, for example, is on the order of 10 seconds even while producing the text log. Without the log I expect the maximum on any drive to be around 2 minutes. Presently recovery is actually cpu bound and I'm using 3 year old hardware. It scales up with the size of the journal and down with the speed of the processor. The size of the filesystem makes little difference. The filesystem can not be mounted read/write until the journal is recovered or a full fsck pass is run. The filesystem will be backwards compatible with earlier ffs implementations. The journal can be enabled or disable with tunefs. The only requirement is sufficient free space for the journal which is stored in a regular inode. The patch I have presented is mostly complete. It only lacks the recovery operation for partial truncation. I'm still running through various scenarios to validate the checker, however, the kernel has been very stable as of late. Please raise any comments or concerns here. I'm going to make another call for testers on current@ and want to keep that reserved for bug reports. Thanks, Jeff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1001191510070.1027>