Date: Fri, 21 May 1999 22:21:45 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: Dom.Mitchell@palmerharvey.co.uk (Dom Mitchell) Cc: naddy@mips.rhein-neckar.de, freebsd-chat@FreeBSD.ORG Subject: Re: SGI, XFS and OSS? Message-ID: <199905212221.PAA06728@usr07.primenet.com> In-Reply-To: <E10knhm-000CNE-00@voodoo.pandhm.co.uk> from "Dom Mitchell" at May 21, 99 12:43:10 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > For those of us who don't use Irix systems, much less administrate any, > > could somebody sum up what's so remarkable about XFS? > > > > Jamie Bowden <ragnar@sysabend.org> wrote: > > > > > XFS is -FAST- > > > > Anything else? > > Basically, it's a transactional logging filesystem (fast recovery, fast > metadata updates), like LFS was going to be. It also has Btree based > directories (as opposed to FFS's linear directories) which can make > things quicker. > > Many other filesystems also have these attributes. For example HPFS > (OS2) and NTFS (WinNT). However, XFS appears to be well done and > designed with Unix in mind. HPFS has btree's, and NTFS has logs. XFS is more similar to IBM's JFS; it's a Jouranalling filesystem. The difference between a Journaliing filesystem and a log structured filesystem is that a log structured filesystem logs transactions, followed by a log of a validation timestamp after they have been committed. A log structured FS moves forward in timestamp increments through transaction records. A journalling filesystem journals the intended action, completes the intended action, and logs a timestamp. The difference here is whether you merely log the action, or you journal your intent. This means that a journaling FS is capable of rolling uncommited transactions backward OR forwards, whereas an LFS can only roll transactions backwards. This is less useful if you are, for example, implementing an ATM machine or doing wire transfers. The LFS will degrade to fsync() performance, whereas the JFS will delay the acknowledgement until the time stamp (commit), but will continue to allow concurrent operations. Similarly, LFS's are unable to imply state; however, a JFS can imply state. This allows you to create a transaction, and then create subtransactions which have been committed, but then abort the transaction, decommitting the subtransactions at the same time. The LFS in BSD 4.4, and in NTFS, and (as has been described) in ext3fs, is inferior to a JFS. Without a JFS, you can't export a transactioning interface to user space without introducing synchronization points. Soft updates can be though of as a logging mechanism, where the log is in memory, and the stanchion commits are really implicit in the metadata ordering. You take one hit because you have to impose an order on the operations, potentially pessimizing them, and you take another because of the graph order vs. whether you are bredth or depth first in your operations, if you perform operations in a tree. In practice, soft updates roll back, just like LFS, and they take the same hierarchy order hit for not being btree'ed in one of depth vs. bredth ordering (i.e., the most intentionally pessimal case you can possibly obtain is the deletion of the /usr/ports tree). Like logging, soft updates *could* expose a user level transaction interface (by adding a "user transaction" order dependency) by introducing additional synchronization points, but such an interface would be far less efficient than the concurrent one a JFS can offer. Finally, as to the "fsck time" argument: the fsck of a soft updates volume following a crash can occur in the backgraound, assuming the creash was not the result of a disk or controller failure, since the only thing that is incorrect is that the cylinder group bitmaps indicate allocations that do not, in fact, exist. This could easily be taken care of by running a "CG fixup" process (as opposed to a full fsck) in the background. The algorithm would be to merely traverse each cylinder group by locking access to it, correcting the bitmap, unlocking it, and going on to the next group. Thus the "reboot time" argument goes out the window, and we are left with: (1) additional synchronization points for stanchion events relative to XFS, (2) the inability to currently support a user leve transactioning interface, and (3) the inability to roll completed transactions forward instead of backward, and the resulting synchronization and/or distributed coherency issues arising therefrom. XSF would be neat technology to integrate, but there is additional work that could be done on soft updates as it currently stands (e.g., the most obvious, which Kirk McKusick and Matt Day, Mark Muhlestien, and myself independently arrived at, is "soft read-only", where if there are no pending transactions for two updated cycles, a flag can be set, and the FS superblock could have the clean bit set. Any dirtying operation thereafter would redirty the superblock, unset the soft read-only bit in the incore flags, and allow the operation to complete. The BSDI implementation has this feature, in fact). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905212221.PAA06728>