Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Apr 1997 18:22:09 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        aaron@veritas.com (Aaron Smith)
Cc:        terry@lambert.org, hackers@freebsd.org
Subject:   Re: Feasibility of porting Linux filesystem code?
Message-ID:  <199704170122.SAA28654@phaeton.artisoft.com>
In-Reply-To: <m0wHezO-000iX3C@megami.veritas.com> from "Aaron Smith" at Apr 16, 97 05:23:48 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> After reading this, Terry, I can only conclude that you have a propensity
> for pretending knowlede of things you incompletely understand. Your
> standard technique appears to be:
> 
> 	- make broad pronouncments
> 	- bring up graph theory
> 
> Writing a journaling filesystem does not require any of the changes you
> mention. You don't know what you're talking about. This might be forgivable
> if your tone was speculative, but you act like you are laying down fact,
> when you could not be more wrong about what is required for a journalling
> filesystem.
> 
> VERITAS VxFS (a journalled filesystem) runs on many, many different UNIX
> variants and none have had to modify the VFS layer or its interface.
> 
> I would recommend easing up on your need to sound authoritative on topics
> you have not mastered.


I was a systems engineer working for Novell/USG (the former USL) for
a number of years before leaving them for my present job.

Among other things, I worked on the UFS (maintenance), NWFS (lead
engineer), NUCFS (assisting engineer), and VXFS (your) source code.

I have also submitted a number of bug fixes (which were integrated
remarkably fast) for the Linux FS code; I'm rather familiar with it
as well, since my job for the first year I was here had to do with
writing commercial Linux FS code in support of a dedicated file and
print server product.


UnixWare 1.0, which returned soft errors like timeouts to the block
I/O interface consumer.  VXFS would occassionally lose parts of the
FS when the area being read was marked bad.  The block I/O interface
had to be modified in SVR4 to accomodate VXFS, which did not retry
operations, so your statement is false for SVR4.


> [discussion of SGI xfs, some mentions of JFS and NTFS]
> 
> >The Linux VFS interface is not reflexive; it's not cut at the right
> >place for a journalled FS.  FreeBSD's is much better, but it's still
> >not there either... I've discussed this in detail with the guy who
> >wrote the read-only NTFS driver for Linux.  You need to be able to
> >treat the VFS interface as if it were a transaction interface to do
> >things like event rollback.  It helps if the internal treatment is
> >architected as event/responder, too.  The same thing would help for
> >Soft Updates, which is basically a transaction order enforcement
> >mechanism, with the dependency graph statically computed (rendering
> >it FFS-specific).

You will note that I did not say the changes were *required*.

While you *can* pound any peg into any hole with a large enough
hammer (in this case, enough extra code), it's not always the best
way to solve the problem.


If you want to implement an XFS or JFS on Linux (each of which is capable
of rolling transactions forward during restart), you would be required
to ensure the atomicity of operations which required reentering the VFS
externally to the VFS encapsulation of operations.  This is problematic,
since you must internally roll these transactions in the mount code,
unless you externalize them in fsck code, and that won't work for
the root filesystem.  It greatly complicates the code if the VOP
calls themselves don't correspond to atomic operations on a 1:1 basis.

You *could* do the code anyway.  Veritas does it in SVR4.2, using an
intent log to enable it to make the atomicity guarantees across
operations which require multiple VOP calls to implement.  This would
be harder on Linux because of the 1:N mapping issues introduced by a
non-reflexive VFS interface.  Partial transactions can not be rolled
forward, only back, if the intent log was written at the VOP layer,
and you were unsure of the VOP which was to follow, or the data that
was required.

Any journalling system is really insufficient (by itself) for this
task overall.  Without direct access to the transaction interface,
there is no guarantee that operations which are not atomic but which
must be linked (for instance, a relational database with changes to
a record file and the index file for that record file) will be treated
as a single transaction.  The partial transaction might be rolled
forward in one file but not the other because they are not tagged as
a unit.  That's why we have abominations before God, like Tuxedo.  So
even if we solve this problem at the VFS level (like I want), it will
remain in user space until there are VFS level VOP_BEGIN and VOP_END
support, and a system call interface to get at them (luckily, I want
this, too).


I would claim that the VFS interface in Linux does not internally
ensure the atomicity of the operations that result from VFS or VOP
calls.  Neither, for that matter, does FreeBSD, though FreeBSD's
major offenses are all localized to its use of VOP_ABORTOP() due to
the failure to properly integrate the NFS lease code.  My namei
patches reduce this number by five, and I have every intention of
reducing this number to zero and removing VOP_ABORTOP as a public
interface for everything but (potentially) the soft update code.
The work done by the VOP_LOOKUP code (and the calling code which
depends on its current behaviour) will have to change to do this.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704170122.SAA28654>