FreeBSD Mail Archives

Date:      Thu, 12 Feb 1998 19:47:34 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        inf@nyef.res.cmu.edu (Marca Registrada)
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: Coda FS: FBSD port done!, but development favors Linux
Message-ID:  <199802121947.MAA19676@usr02.primenet.com>
In-Reply-To: <19980212123955.08290@nyef.res.cmu.edu> from "Marca Registrada" at Feb 12, 98 12:39:55 pm


YES!  REAL DEVELOPEMENT AT LAST!

>  The current Coda release that I know of for FreeBSD is supposed to be for
> -stable, so my first project may be to port it to -current (although I've
> heard this may be difficult), and it would be easier for me to make light
> contributions from time to time to do whatever is necessary when the
> -stable-patched are unworkable for -current.

This may be difficult.  I will provide advice and information on
Poul-Henning Kamp's interface changes, and other issues, as necessary
(the changes somewhat broke Kirk McKusick's intended design, where
the UFS code was to provide directory facilities, and the FFS code
was to provide a linear (but not necessarily externalized) flat
namespace).

The main differences will be in VOP's and locking, and are pretty
trivial (ie: 4.4-Lite2 didn't do much, and neither has anyone since,
barring minor cleanups).


> > * Development, particularly in the area of scalability, is focused on
> >   Linux.  Why?  His stated reasons:
> > 
> >    * Linux's ext2fs filesystem is much faster than *BSD's ffs
> >        (How good is FreeBSD's ext2fs support these days?  Is
> >        it in 2.2.6 or must we wait for 3.0?)
> 
>   Would anyone think that softupdate's may fix this?  I havn't keep close
> enough track of the discussion to know when softupdates may ever come
> around, though.

Linux's ext2fs is apparently faster because it is, by default, mounted
async.  As a real FS hacker, he should be aware that an fsck can only
undo one state transition.  After ext2fs crashes, the FS after the
fsck is in *a* consistent state, but not *the* consistent state it
would have been had the crash not taken place.  For each async call
that takes place, you have another potential state.  In general:

	For N outstanding operations, there are 2^(N-1) possible
	ground states following a one state change by fsck.

This means for 11 outstanding operations, you have less than a 1 in
1000 chance of fsck guessing the right one.

Classic implementations have guaranteed ordering using synchronus
writes of metadata.  This is the FFS default mechanism.  Other
approaches to ordering guarantees are:

o	Log structuring (fragmentation is high)
o	Journalling (commits are slow and fragmentation is high)
o	Delayed Ordered Writes ("Banker's Algorithm" for graph
	reduction sacrifices speed for overcautious safety; also
	patent-pending by USL, so not usable)
o	Soft Updates (within 5% of async, faster for some things,
	and with all the safety of synchronus writes).

So the answers are, in order:

A)	There's nothing to fix; ext2fs is being used with a false
	sense of safety.
B)	Yes.  Soft Updates in FreeBSD address the speed issue.


> >    * Current work is being done to develop Linux kernel extensions that
> >        will allow access to files via raw inodes.  This development is
> >        seen as key to allowing Coda to support large filespaces with
> >        reasonable performance.  See this URL for Peter's notes on
> >        these extensions:
> 
>  From the latest I heard on the Coda lists, Linus is very against this
> becuase he feels it ruins the consistency of the FS interface.

It doesn't, really.  What it *does* do is blow out the inheritance
security model based on directory permissions.  The one way to
save it from this is to change the structure of hard links on
disk, and then keep parent pointers in all inodes.  Then you traverse
to root creating a path vector, and then traverse down the vector
applying permissions.

FreeBSD doesn't currently support this (about the only FS which does
is NXFS, the NetWare eXtended File System, which I wrote for the
NetWare for UNIX product while I was at Novell).

Without this, if you can get a path on a filesystem, you can open
any inode that you have permissions to the inode, regardless of the
permissions of the intermediate path.


> This of
> course can change at any moment.  The current proposal is to make an
> filesystem where inodes can be accessed directly as files.. ie:
> 
> fopen("/mnt/__inode_#12345#","r");
>   or something similar looking to that.  It actually doesn't sound like a
> monster to implement at all.  And as a separate filesystem solves many of
> the fsck problems Coda currently has.

I have implemented this at one time, and I have very recently provided
assistance to Adrian Chadd, who has implemented it in -current.  The
idea is not new.  This is called a "namespace incursion".  It places
a "magic" prefix in the namespace.  My suggested escape, and the one
I believe Adrian used, is the string ^I^N^O^D^E (8th bit set on all
5 characters), followed by decimal digits for the inode number.  You
can use any path onto the FS to get the dev_t.  This works for current
working directory, as well.

Probably the correct way to implement this is to use the POSIX
namespace escape, "//".  Unfortunately, the FreeBSD namei() code
is broken, such that an escape can not be inherited on a per
path component basis, and applied solely to the terminal path
component.  I have patches for this which have not been committed.

Practically, for this specific use, the namespace incursion is
just about as good.  You can reach Adrian Chadd at the following
email address:

	<adrian@creative.net.au>


> I'm totally with you on wanting to get Coda going strong on FreeBSD, and
> will lend all the free coding time I have.  

If you need any architectural information, let me know.  I know the
FreeBSD FS code backwards and forwards.


> As an aside, you also mentioned AFS.  Has that been progressign at all on
> the FreeBSD front?  I havn't heard anything but light rustle about AFS.

AFS was ported to NetBSD.  FreeBSD couldn't use the NetBSD implementation
for two reasons:

1)	The kernel interface differences between FreeBSD and NetBSD;
	namely, the interfaces consumed by FS implementations that
	were terminal (bottom end) implementations.  Things like
	local media filesystems, the NFS client, and the AFS client.

	These differences have recently gotten worse.  I've been
	working upstream to try to reduce the number of interfaces
	that are consumed by terminal FS implementations, but it
	is slow going trying to get the code committed.

2)	The VOP interface difference.  Initially, this was just the
	mechanism for use of "cookies" in VOP_READDIR (a particularly
	ugly soloution to the search restart problem brought on by
	the underlying FS exposing the wrong view of the struct direct,
	instead of exposing an opaque pointer and a translation VOP).
	NetBSD has a slightly different cookie mechanism, but both
	are fundamentally broken by design.  This leads to a lot of
	NFS code working around the breakage, and occasional NFS
	problems.  The same workarounds are required in most VFS
	consumers.

	These differences have also recently gotten worse, and in
	fact, a number of VOP's necessary to the seperation of
	block naming from the imposistion of directory hierarchy
	have been removed.  Eventually I expect them to come back.

In any case, if it's already working, then you have these workarounds;
if you don't, there are ways around the problems that I can help you
with, if CODA isn't enough of a reason to get it fixed the right way.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199802121947.MAA19676>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation