Date: Wed, 16 Apr 1997 18:22:09 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: aaron@veritas.com (Aaron Smith) Cc: terry@lambert.org, hackers@freebsd.org Subject: Re: Feasibility of porting Linux filesystem code? Message-ID: <199704170122.SAA28654@phaeton.artisoft.com> In-Reply-To: <m0wHezO-000iX3C@megami.veritas.com> from "Aaron Smith" at Apr 16, 97 05:23:48 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> After reading this, Terry, I can only conclude that you have a propensity > for pretending knowlede of things you incompletely understand. Your > standard technique appears to be: > > - make broad pronouncments > - bring up graph theory > > Writing a journaling filesystem does not require any of the changes you > mention. You don't know what you're talking about. This might be forgivable > if your tone was speculative, but you act like you are laying down fact, > when you could not be more wrong about what is required for a journalling > filesystem. > > VERITAS VxFS (a journalled filesystem) runs on many, many different UNIX > variants and none have had to modify the VFS layer or its interface. > > I would recommend easing up on your need to sound authoritative on topics > you have not mastered. I was a systems engineer working for Novell/USG (the former USL) for a number of years before leaving them for my present job. Among other things, I worked on the UFS (maintenance), NWFS (lead engineer), NUCFS (assisting engineer), and VXFS (your) source code. I have also submitted a number of bug fixes (which were integrated remarkably fast) for the Linux FS code; I'm rather familiar with it as well, since my job for the first year I was here had to do with writing commercial Linux FS code in support of a dedicated file and print server product. UnixWare 1.0, which returned soft errors like timeouts to the block I/O interface consumer. VXFS would occassionally lose parts of the FS when the area being read was marked bad. The block I/O interface had to be modified in SVR4 to accomodate VXFS, which did not retry operations, so your statement is false for SVR4. > [discussion of SGI xfs, some mentions of JFS and NTFS] > > >The Linux VFS interface is not reflexive; it's not cut at the right > >place for a journalled FS. FreeBSD's is much better, but it's still > >not there either... I've discussed this in detail with the guy who > >wrote the read-only NTFS driver for Linux. You need to be able to > >treat the VFS interface as if it were a transaction interface to do > >things like event rollback. It helps if the internal treatment is > >architected as event/responder, too. The same thing would help for > >Soft Updates, which is basically a transaction order enforcement > >mechanism, with the dependency graph statically computed (rendering > >it FFS-specific). You will note that I did not say the changes were *required*. While you *can* pound any peg into any hole with a large enough hammer (in this case, enough extra code), it's not always the best way to solve the problem. If you want to implement an XFS or JFS on Linux (each of which is capable of rolling transactions forward during restart), you would be required to ensure the atomicity of operations which required reentering the VFS externally to the VFS encapsulation of operations. This is problematic, since you must internally roll these transactions in the mount code, unless you externalize them in fsck code, and that won't work for the root filesystem. It greatly complicates the code if the VOP calls themselves don't correspond to atomic operations on a 1:1 basis. You *could* do the code anyway. Veritas does it in SVR4.2, using an intent log to enable it to make the atomicity guarantees across operations which require multiple VOP calls to implement. This would be harder on Linux because of the 1:N mapping issues introduced by a non-reflexive VFS interface. Partial transactions can not be rolled forward, only back, if the intent log was written at the VOP layer, and you were unsure of the VOP which was to follow, or the data that was required. Any journalling system is really insufficient (by itself) for this task overall. Without direct access to the transaction interface, there is no guarantee that operations which are not atomic but which must be linked (for instance, a relational database with changes to a record file and the index file for that record file) will be treated as a single transaction. The partial transaction might be rolled forward in one file but not the other because they are not tagged as a unit. That's why we have abominations before God, like Tuxedo. So even if we solve this problem at the VFS level (like I want), it will remain in user space until there are VFS level VOP_BEGIN and VOP_END support, and a system call interface to get at them (luckily, I want this, too). I would claim that the VFS interface in Linux does not internally ensure the atomicity of the operations that result from VFS or VOP calls. Neither, for that matter, does FreeBSD, though FreeBSD's major offenses are all localized to its use of VOP_ABORTOP() due to the failure to properly integrate the NFS lease code. My namei patches reduce this number by five, and I have every intention of reducing this number to zero and removing VOP_ABORTOP as a public interface for everything but (potentially) the soft update code. The work done by the VOP_LOOKUP code (and the calling code which depends on its current behaviour) will have to change to do this. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704170122.SAA28654>