From owner-freebsd-fs  Sun Aug 22 17:26:18 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from bastard.40ounce.net (bastard.40ounce.net [206.14.239.252])
	by hub.freebsd.org (Postfix) with ESMTP id 95B8314C23
	for <freebsd-fs@freebsd.org>; Sun, 22 Aug 1999 17:26:13 -0700 (PDT)
	(envelope-from jb@syndicate.net)
Received: from syndicate.net (adsl-209-233-31-199.dsl.snfc21.pacbell.net [209.233.31.199])
	by bastard.40ounce.net (8.8.8/8.8.8) with ESMTP id RAA16079
	for <freebsd-fs@freebsd.org>; Sun, 22 Aug 1999 17:26:35 -0700 (PDT)
	(envelope-from jb@syndicate.net)
Message-ID: <37C0951B.837EA807@syndicate.net>
Date: Sun, 22 Aug 1999 17:26:03 -0700
From: James Brown <jb@syndicate.net>
Reply-To: jb@syndicate.net
Organization: SYNDICATE Consulting Group, Inc.
X-Mailer: Mozilla 4.5 [en] (X11; U; FreeBSD 2.2.8-STABLE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: ccd striped array slower than single disk!
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

hello,

bonnie reports 5636 K/s when using a single disk without ccd. 
i've been testing ccd performance, using various interleaves. 
the best performance i've seen striping with ccd has been 5571
K/s using an interleave of 32768.

these are identical disks (maxtor 27.2 gb) and are on two
separate ultra dma/33 busses (one is bus 1 slave, other is bus 2
master but that shouldn't matter, right?).  shouldn't i be seeing
something closer to 9000 or 10000 K/s?

thank you,
james


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Aug 22 17:42:50 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134])
	by hub.freebsd.org (Postfix) with ESMTP id ABC79155FE
	for <freebsd-fs@FreeBSD.ORG>; Sun, 22 Aug 1999 17:42:44 -0700 (PDT)
	(envelope-from grog@freebie.lemis.com)
Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137])
	by allegro.lemis.com (8.9.1/8.9.0) with ESMTP id KAA15417;
	Mon, 23 Aug 1999 10:12:38 +0930 (CST)
Received: (from grog@localhost)
	by freebie.lemis.com (8.9.3/8.9.0) id KAA83501;
	Mon, 23 Aug 1999 10:12:37 +0930 (CST)
Date: Mon, 23 Aug 1999 10:12:37 +0930
From: Greg Lehey <grog@lemis.com>
To: James Brown <jb@syndicate.net>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: ccd striped array slower than single disk!
Message-ID: <19990823101237.C83273@freebie.lemis.com>
References: <37C0951B.837EA807@syndicate.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
In-Reply-To: <37C0951B.837EA807@syndicate.net>; from James Brown on Sun, Aug 22, 1999 at 05:26:03PM -0700
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-41-739-7062
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sunday, 22 August 1999 at 17:26:03 -0700, James Brown wrote:
> hello,
>
> bonnie reports 5636 K/s when using a single disk without ccd.

Bonnie lies.

> i've been testing ccd performance, using various interleaves.  the
> best performance i've seen striping with ccd has been 5571 K/s using
> an interleave of 32768.

Do you mean 32768 blocks or bytes?  One's rather small, the other is
unnecessarily large.  I've found stripe sizes in the order of 256 kB
to be optimum.

> these are identical disks (maxtor 27.2 gb) and are on two
> separate ultra dma/33 busses (one is bus 1 slave, other is bus 2
> master but that shouldn't matter, right?).  shouldn't i be seeing
> something closer to 9000 or 10000 K/s?

That depends on what you're doing.  Remember that bonnie uses buffer
cache, so the results are usually inaccurate and difficult to repeat.
You also don't say which of the bonnie tests you're quoting.  Try
rawio (in the Ports Collection) instead, and remember that it's
usually the random access tests which count.  If you're still unhappy,
I'd be interested to see how you fare with Vinum instead of ccd.

Greg
--
See complete headers for address, home page and phone numbers
finger grog@lemis.com for PGP public key


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:28:50 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id C9330150CF; Mon, 23 Aug 1999 23:28:25 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04603;
	Mon, 23 Aug 1999 23:28:10 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:23:56 -0700 (PDT)
Message-Id: <199908240628.XAA04603@gilliam.users.flyingcroc.net>
To: Julian Elischer <julian@whistle.com>
Cc: Bill Studenmund <wrstuden@nas.nasa.gov>,
	Terry Lambert <tlambert@primenet.com>,
	Alton Matthew <Matthew.Alton@anheuser-busch.com>,
	Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <Pine.BSF.3.95.990818105716.12306A-100000@current1.whistle.com>, Julian Elischer writes:
>On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:
>
>> Matt doesn't represent the FreeBSD project, and even if he rewrites
>> the VFS subsystem so he can understand it, his rewrite would face
>> considerable resistance on its way into FreeBSD.  I don't think
>> there is reason to rewrite it, but there certainly are areas
>> that need fixing.
>
>You are misinformed as far as I know.. From discussions I saw, th
>main architect of a VFS rewrite would be Kirk, and Matt would be acting as
>Kirk's right-hand-man.

I bet that Matt and Kirk uses "rewrite" for two very different
concepts.  The resulting reviews will be equally different.

>> >> 	The use of the "vfs_default" to make unimplemented VOP's
>> >> 	fall through to code which implements function, while well
>> >> 	intentioned, is misguided.
>> 
>> I beg to differ.  The only difference is that we pass through
>> multiple layers before we hit the bottom of the stack.  There is
>> no loss of functionality but significant gain of clarity and
>> modularity.
>
>Well I believe that Kirk considers them misguided too, but he stated that
>he wasn't going to remove them without serious thought about the alternatives.

I'll be more than ready to discuss this with Kirk.

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:28:51 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 024D615134; Mon, 23 Aug 1999 23:28:26 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04552;
	Mon, 23 Aug 1999 23:28:07 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:07:25 -0700 (PDT)
Message-Id: <199908240628.XAA04552@gilliam.users.flyingcroc.net>
To: Bill Studenmund <wrstuden@nas.nasa.gov>
Cc: Terry Lambert <tlambert@primenet.com>, Hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <Pine.SOL.3.96.990818104932.14430D-100000@marcy.nas.nasa.gov>, Bill Studenmund writes:

>> >I doubt we need more than 64 bit times. 2^63 seconds works out to
>> >292,279,025,208 years, or 292 (american) billion years. Current theories
>> >put the age of the universe at I think 12 to 16 billion years. So 64-bit
>> >signed times in seconds will cover from before the big bang to way past
>> >any time we'll be caring about. :-)
>
>I was unclear. I was refering to the seconds side of things. Sub-second
>resolution would need other bits.

Yes, but we need subsecond in the filesystems.  Think about make(1) on
a blinding fast machine...

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:29: 4 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id BF57C15278; Mon, 23 Aug 1999 23:28:30 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04645;
	Mon, 23 Aug 1999 23:28:12 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:37:03 -0700 (PDT)
Message-Id: <199908240628.XAA04645@gilliam.users.flyingcroc.net>
From: Matthew Dillon <dillon@apollo.backplane.com>
To: Julian Elischer <julian@whistle.com>
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	Bill Studenmund <wrstuden@nas.nasa.gov>,
	Terry Lambert <tlambert@primenet.com>,
	Alton Matthew <Matthew.Alton@anheuser-busch.com>,
	Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:
:
:> Matt doesn't represent the FreeBSD project, and even if he rewrites
:> the VFS subsystem so he can understand it, his rewrite would face
:> considerable resistance on its way into FreeBSD.  I don't think
:> there is reason to rewrite it, but there certainly are areas
:> that need fixing.
:
:You are misinformed as far as I know.. From discussions I saw, th
:main architect of a VFS rewrite would be Kirk, and Matt would be acting as
:Kirk's right-hand-man.

    Yes, this is correct.  Kirk is going to be the main architect.  I have
    been heavily involved and will continue to be.

:> >> 	The use of the "vfs_default" to make unimplemented VOP's
:
:> I beg to differ.  The only difference is that we pass through
:> multiple layers before we hit the bottom of the stack.  There is
:...
:Well I believe that Kirk considers them misguided too, but he stated that
:he wasn't going to remove them without serious thought about the alternatives.

    The vfs op callout layering has not been on the radar screen.  There
    are much too many other more serious problems.  I really doubt that any
    changes will be made to this piece any time in the next year or even two,
    if at all.

    The main items on the radar screen are related to buffer management
    (struct buf stuff.  For example, preventing VM blockages due to pages
    being wired by write I/O's), VFS locking and reference count issues 
    (for example, namei lookups, blockages in the pager and syncer due to
    vnode locks held by blocked processes, etc...), and interactions 
    between VFS and VM (for example: moving away from VOP_READ/VOP_WRITE 
    and moving more towards a getpages/putpages model).

    None of the items have been set in stone yet.  We're waiting for Kirk
    to get back from vacation and get back into the groove.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:29:18 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id DCB41152B1; Mon, 23 Aug 1999 23:28:30 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04556;
	Mon, 23 Aug 1999 23:28:07 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:09:14 -0700 (PDT)
Message-Id: <199908240628.XAA04556@gilliam.users.flyingcroc.net>
From: Julian Elischer <julian@whistle.com>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Bill Studenmund <wrstuden@nas.nasa.gov>,
	Terry Lambert <tlambert@primenet.com>,
	Alton Matthew <Matthew.Alton@anheuser-busch.com>,
	Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

> Matt doesn't represent the FreeBSD project, and even if he rewrites
> the VFS subsystem so he can understand it, his rewrite would face
> considerable resistance on its way into FreeBSD.  I don't think
> there is reason to rewrite it, but there certainly are areas
> that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

> 
> >> 	The use of the "vfs_default" to make unimplemented VOP's
> >> 	fall through to code which implements function, while well
> >> 	intentioned, is misguided.
> 
> I beg to differ.  The only difference is that we pass through
> multiple layers before we hit the bottom of the stack.  There is
> no loss of functionality but significant gain of clarity and
> modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:29:55 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1B31215193; Mon, 23 Aug 1999 23:28:27 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04619;
	Mon, 23 Aug 1999 23:28:11 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:33:43 -0700 (PDT)
Message-Id: <199908240628.XAA04619@gilliam.users.flyingcroc.net>
From: Terry Lambert <tlambert@primenet.com>
Subject: Re: BSD XFS Port & BSD VFS Rewrite
To: wrstuden@nas.nasa.gov
Cc: tlambert@primenet.com, Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > > > > > 2.	Advisory locks are hung off private backing objects.
> > > I'm not sure. The struct lock * is only used by layered filesystems, so
> > > they can keep track both of the underlying vnode lock, and if needed their
> > > own vnode lock. For advisory locks, would we want to keep track both of
> > > locks on our layer and the layer below? Don't we want either one or the
> > > other? i.e. layers bypass to the one below, or deal with it all
> > > themselves.
> > 
> > I think you want the lock on the intermediate layer: basically, on
> > every vnode that has data associated with it that is unique to a
> > layer.  Let's not forget, also, that you can expose a layer into
> > the namespace in one place, and expose it covered under another
> > layer, at another.  If you locked down to the backing object, then
> > the only issue you would be left with is one or more intermediate
> > backing objects.
> 
> Right. That exported struct lock * makes locking down to the lowest-level
> file easy - you just feed it to the lock manager, and you're locking the
> same lock the lowest level fs uses. You then lock all vnodes stacked over
> this one at the same time. Otherwise, you just call VOP_LOCK below and
> then lock yourself.

I think this defeats the purpose of the stacking architecture; I
think that if you look at an unadulterated NULLFS, you'll see what I
mean.

Intermediate FS's should not trap VOP's that are not applicable
to them.

One of the purposes of doing a VOP_LOCK on intermediate vnodes
that aren't backing objects is to deal with the global vnode
pool management.  I'd really like FS's to own their vnode pools,
but even without that, you don't need the locking, since you
only need to flush data on vnodes that are backing objects.

If we look at a stack of FS's with intermediate exposure into the
namespace, then it's clear that the issue is really only applicable
to objects that act as a backing store:


----------------------	----------------------	--------------------
FS			Exposed in hierarchy	Backing object
----------------------	----------------------	--------------------
top			yes			no
intermediate_1		no			no
intermediate_2		no			yes
intermediate_3		yes			no
bottom			no			yes
----------------------	----------------------	--------------------

So when we lock "top", we only lock in intermediate_2 and in bottom.

Then we attempt to lock in intermediate_3, but it fails: not because
there is a lock on the vnode in intermediate_3, but because there is
a lock in bottom.

It's unnecessary to lock the vnodes in the intermediate path, or
even at the exposure level, unless they are vnodes that have an
associated backing store.

The need to lock in intermediate_2 exists because it is a translation
layer or a namespace escape.  It deals with compression, or it deals
with file-as-a-directory folding, or it deals with file-hiding
(perhaps for a quoata file), etc..  If it didn't, it wouldn't need
backing store (and therefore wouldn't need to be locked).


> > For a layer with an intermediate backing object, I'm prepared to
> > declare it "special", and proxy the operation down to any inferior
> > backing object (e.g. a union FS that adds files from two FS's
> > together, rather than just directoriy entry lists).  I think such
> > layers are the exception, not the rule.
> 
> Actually isn't the only problem when you have vnode fan-in (union FS)? 
> i.e.  a plain compressing layer should not introduce vnode locking
> problems. 

If it's a block compression layer, it will.  Also a translation layer;
consider a pure Unicode system that wants to remotely mount an FS
from a legacy system.  To do this, it needs to expand the pages from
the legacy system [only it can, since the legacy system doesn't know
about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
on a file on such a system.  To propogate the lock, you have to do
an arithmetic conversion at the translation layer.  This gets worse
if the lower end FS is exposed in the namespace as well.

You could make the same arguments for other types of translation or
namespace escapes.


> > I think that export policies are the realm of /etc/exports.
> > 
> > The problem with each FS implementing its own policy, is that this
> > is another place that copyinstr() gets called, when it shouldn't.
> 
> Well, my thought was that, like with current code, most every fs would
> just call vfs_export() when it's presented an export operation. But by
> retaining the option of having the fs do its own thing, we can support
> different export semantics if desired.

I think this bears down on whether the NFS server VFS consumer is
allowed access to the VFS stack at the particular intermediate
layer.  I think this is really an administrative policy decision,
and not an option for the VFS.

I think it would be bad if a given VFS could refuse to participate
in a stacking operation because it didn't like who was stacking.

If we insist on the ability for a VFS to refused stacking, then
we should generalize the idea, such that an intermediate VFS could
refuse exposure into the filesystem namespace accessible to users.

Consider the case of a VFS without quota support, stacked under a
VFS layer that provided quota support by hiding a file in the top
level directory ("quota") and then folding the directory closed by
rerooting in a subdirectory of the top level directory ("root/").

It's reasonable to assume that most admins that want to enforce
quotas would *not* want the possibility of exposing the VFS without
quota support in the user accessible namespace.  Should the VFS
without quotas refuse such exposure?

I think the answer is "no", and that it is an administrative
control issue, not a VFS's preference issue.  Administrators enforce
this by protecting the path to exposure points, or by mounting
stacks over top of exposure points, which results in the exposure
being hidden under another mount.  Using the QUOTAFS example, you
mount the FS to be quota-enforced on /home, and then you mount
the QUOTAFS over top of it, and have it cover "/home" itself,
hiding the underlying FS from exposure.


> > I would resolve this by passing a standard option to the mount code
> > in user space.  For root mounts, a vnode is passed down.  For other
> > mounts, the vnode is parsed and passed if the option is specified.
> 
> Or maybe add a field to vfsops. This info says what the mount call will
> expect (I want a block device, a regular file, a directory, etc), so it
> fits. :-)

This is actually an elegant soloution to the problem.  Much of the
time, we don't consider data interfaces when they are appropriate
because of their widespread use in inappropriate ways (e.g. "ps").


> Also, if we leave it to userland, what happens if someone writes a
> program which calls sys_mount with something the fs doesn't expect. :-)

Well, that gets to another grail of mine: when a device containing
a filesystem "arrives", I believe it should trigger a mount into
the list of mounted filesystems.

I don't necessarily mean that it should also be exported into the
filesystem hierarchy at that point (but it's an option, using the
"last mounted on" information).


> > I think that you will only be able to find rare examples of FS's
> > that don't take device names as arguments.  But for those, you
> > don't specify the option, and it gets "NULL", and whatever local
> > options you specify.
> 
> I agree I can't see a leaf fs not taking a device node. But layered
> fs's certainly will want something else. :-)

I think they want a vnode of an already mounted FS.  The trick is
to enforce the "already mounted" part of that.  I'm comforable with
doing this by saying "it's not already mounted until you can look
up a vnode on it".


> > The point is that, for FS's that can be both root and sub-root,
> > the mount code doesn't have to make the decision, it can be punted
> > to higher level code, in one place, where the code can be centrally
> > maintained and kept from getting "stale" when things change out
> > from under it.
> 
> True.
> 
> And with good comments we can catch the times when the centrally located
> code changes & brakes an assumption made by the fs. :-)

8-).


> > > Except for a minor buglet with device nodes, stacking works in NetBSD at
> > > present. :-)
> > 
> > Have you tried Heidemann's student's stacking layers?  There is one
> > encryption, and one per-file compression with namespace hiding, that
> > I think it would be hard pressed to keep up with.  But I'll give it
> > the benefit of the doubt.  8-).
> 
> Nope. The problem is that while stacking (null, umap, and overlay fs's)
> work, we don't have the coherency issues worked out so that upper layers
> can cache data. i.e. so that the lower fs knows it has to ask the uper
> layers to give pages back. :-) But multiple ls -lR's work fine. :-)

With UVM in NetBSD, this is (supposedly) not an issue.

You could actually think of it this way, as well: only FS's that
contain vnodes that provide backing should implement VOP_GETPAGES
and VOP_PUTPAGES, and all I/O should be done through paging.


> > > I agree it's ugly, but it has the advantage that it doesn't grow the
> > > on-disk inode. A lot of flks have designs on the remaining 64 bits free.
> > > :-)
> > 
> > Well, so long as we can resolve the issue for a long, long time;
> > I plan on being around to have to put up with the bugs, if I can
> > wrangle it... 8-).
> 
> :-)
> 
> I bet by then (559447 AD) we won't be using ffs, so the problem will be
> moot. :-)

Unless I'm the curator of a computer museum... 8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:30: 3 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id BD446158A0; Mon, 23 Aug 1999 23:28:58 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04712;
	Mon, 23 Aug 1999 23:28:19 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:58:41 -0700 (PDT)
Message-Id: <199908240628.XAA04712@gilliam.users.flyingcroc.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: michaelh@cet.co.jp, wrstuden@nas.nasa.gov,
	Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <199908181848.LAA14960@usr02.primenet.com>, Terry Lambert writes:

>> >You would have to de-collapse several VOP lists that have been
>> >pre-collapsed.
>> 
>> You are talking gibberish here.  Please show code where this is
>> a problem.
>
>When you write a proxy stacking layer, such as John Heidemann's
>network proxy stacking layer (an NFS alternative), VOP's which
>would normally be handled by vfs_default have to be handled on
>the other end of the proxy, instead, in the same way that they
>would be handled by the vfs_default stuff.

And what prevents you from taking over the default op ?

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:30:39 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4D9AE158F0; Mon, 23 Aug 1999 23:29:05 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04749;
	Mon, 23 Aug 1999 23:28:20 -0700 (PDT)
Date: Wed, 18 Aug 1999 12:09:27 -0700 (PDT)
Message-Id: <199908240628.XAA04749@gilliam.users.flyingcroc.net>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
Reply-To: Bill Studenmund <wrstuden@nas.nasa.gov>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Terry Lambert <tlambert@primenet.com>, Hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

> Yes, but we need subsecond in the filesystems.  Think about make(1) on
> a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:30:46 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6B545158B1; Mon, 23 Aug 1999 23:28:59 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04725;
	Mon, 23 Aug 1999 23:28:19 -0700 (PDT)
Date: Wed, 18 Aug 1999 12:01:18 -0700 (PDT)
Message-Id: <199908240628.XAA04725@gilliam.users.flyingcroc.net>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
Reply-To: Bill Studenmund <wrstuden@nas.nasa.gov>
To: Terry Lambert <tlambert@primenet.com>
Cc: Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: BSD XFS Port & BSD VFS Rewrite
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 18 Aug 1999, Terry Lambert wrote:

> > Right. That exported struct lock * makes locking down to the lowest-level
> > file easy - you just feed it to the lock manager, and you're locking the
> > same lock the lowest level fs uses. You then lock all vnodes stacked over
> > this one at the same time. Otherwise, you just call VOP_LOCK below and
> > then lock yourself.
> 
> I think this defeats the purpose of the stacking architecture; I
> think that if you look at an unadulterated NULLFS, you'll see what I
> mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

> Intermediate FS's should not trap VOP's that are not applicable
> to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

> One of the purposes of doing a VOP_LOCK on intermediate vnodes
> that aren't backing objects is to deal with the global vnode
> pool management.  I'd really like FS's to own their vnode pools,
> but even without that, you don't need the locking, since you
> only need to flush data on vnodes that are backing objects.
> 
> If we look at a stack of FS's with intermediate exposure into the
> namespace, then it's clear that the issue is really only applicable
> to objects that act as a backing store:
> 
> 
> ----------------------	----------------------	--------------------
> FS			Exposed in hierarchy	Backing object
> ----------------------	----------------------	--------------------
> top			yes			no
> intermediate_1		no			no
> intermediate_2		no			yes
> intermediate_3		yes			no
> bottom			no			yes
> ----------------------	----------------------	--------------------
> 
> So when we lock "top", we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

> > Actually isn't the only problem when you have vnode fan-in (union FS)? 
> > i.e.  a plain compressing layer should not introduce vnode locking
> > problems. 
> 
> If it's a block compression layer, it will.  Also a translation layer;
> consider a pure Unicode system that wants to remotely mount an FS
> from a legacy system.  To do this, it needs to expand the pages from
> the legacy system [only it can, since the legacy system doesn't know
> about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
> on a file on such a system.  To propogate the lock, you have to do
> an arithmetic conversion at the translation layer.  This gets worse
> if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

> > Nope. The problem is that while stacking (null, umap, and overlay fs's)
> > work, we don't have the coherency issues worked out so that upper layers
> > can cache data. i.e. so that the lower fs knows it has to ask the uper
> > layers to give pages back. :-) But multiple ls -lR's work fine. :-)
> 
> With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

> You could actually think of it this way, as well: only FS's that
> contain vnodes that provide backing should implement VOP_GETPAGES
> and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Aug 23 23:31: 7 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gilliam.users.flyingcroc.net (gilliam.users.flyingcroc.net [207.246.128.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id CDCC915901; Mon, 23 Aug 1999 23:29:03 -0700 (PDT)
	(envelope-from ross@gilliam.users.flyingcroc.net)
Received: (from ross@localhost)
	by gilliam.users.flyingcroc.net (8.9.3/8.9.3) id XAA04695;
	Mon, 23 Aug 1999 23:28:18 -0700 (PDT)
Date: Wed, 18 Aug 1999 11:49:25 -0700 (PDT)
Message-Id: <199908240628.XAA04695@gilliam.users.flyingcroc.net>
From: Terry Lambert <tlambert@primenet.com>
Subject: Re: BSD XFS Port & BSD VFS Rewrite
To: phk@critter.freebsd.dk (Poul-Henning Kamp)
Cc: tlambert@primenet.com, michaelh@cet.co.jp, wrstuden@nas.nasa.gov,
	Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> > > I'm not familiar with the VFS_default stuff. All the vop_default_desc
> >> > > routines in NetBSD point to error routines.
> >> > 
> >> > In FreeBSD, they now point to default routines that are *not* error
> >> > routines.  This is the problem.  I admit the change was very well
> >> > intentioned, since it made the code a hell of a lot more readable,
> >> > but choosing between readable and additional function, I take function
> >> > over form (I think the way I would have "fixed" the readability is by
> >> > making the operations that result in the descriptor set for a mounted
> >> > FS instance be both discrete, and named for their specific function).
> >> 
> >> As I recall most of FBSD's default routines are also error routines, if
> >> the exceptions were a problem it would would be trivial to fix.
> >
> >You would have to de-collapse several VOP lists that have been
> >pre-collapsed.
> 
> You are talking gibberish here.  Please show code where this is
> a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

Some VOP's, like advisory locking, need both local assertion and
remote proxy of the VOP to avoid introducing race windows.

The result of this is that, if you rely on the vfs_default stuff,
then you can't proxy those VOP's into a different address space,
either on another machine, or to a user space VFS stacking layer
developement environment.

This is the same problem that embedding VM references directly
into any FS causes, and that vm_object_t aliases would exacerbate.

John has, in the past, sent me a number of stacking layers done
by various people, with the requirement that I not redistribute
them, as they are not what he would consider to be properly
representative of finished work.

Since John himself did the network proxy, you could perhaps get
him to send you a copy, so you could have direct access to code
where this was a problem.

Make sure that the system you are talking to over the proxy is
not assumed to be a FreeBSD system (e.g. don't assume that the
vfs_default stuff exists on the other end of the proxy, or that
it would be functional).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Aug 24 17:11:23 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 0D82314DEF
	for <fs@freebsd.org>; Tue, 24 Aug 1999 17:11:20 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.8.8/8.8.8) with ESMTP id RAA21060
	for <fs@freebsd.org>; Tue, 24 Aug 1999 17:24:42 GMT
	(envelope-from bright@wintelcom.net)
Date: Tue, 24 Aug 1999 17:24:42 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: fs@freebsd.org
Subject: VFS cleanup and fh*() syscalls almost complete.
Message-ID: <Pine.BSF.4.05.9908241711110.6392-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I have tested this code lightly, it compiles cleanly and the
NFS server code still seems to work.

http://big.endian.org/~bright/freebsd/in_progress/vfs-fhsyscall.diff

Several things have been done, mostly inspired by NetBSD:

1) VFS_FHTOVP has been split into 2 VFS ops:
   VFS_FHTOVP now only takes a mountpoint, filehandle and **vnode
   VFS_CHECKEXP now takes the export checking arguments that
     VFS_FHTOVP used to take.

2) The casting of VFS ops to eopnotsupp() has been removed and
     vfs_nop*() functions have been put into kern/vfs_default.c

   This makes it more clear that certain VFS-ops are giving default
   behavior, either returning automatic success or returning EOPNOTSUPP.

   Someone mentioned that EOPNOTSUPP should be replaced by EINVAL, can
   that person please speak up?

   Also, some filesystems that actually implemented VFS-ops that did
   nothing or returned EOPNOTSUPP have had those functions deleted
   except when they are essential to understanding the VFS, or
   if it looked like they may eventually be filled in.

3) fh(open|stat|statfs) have been haphazardly implemented, meaning I 
   won't be able to test them until tonight.

Testers? Critics? Comments? please?

If you're wondering why/what I'm doing, it's the kernel support
for a lockd that I'm working on, as well as a cleanup I thought
would make it easier to understand our filesystem code.

I'm sure some people will be wondering: 
Why does VFS_CHECKEXP take a vnode and not a mount point? 
Hopefully in the future a filesystem will be able to more 
restrictively export its files, this will help facilitate that.

Lastly, I'm jumping the gun here and posting a lightly tested
patch because I've had to re-merge this stuff 3 times already and
would like comments on not only functionality, but style as well.
This way I can either commit it, (after more testing) or scrap the
whole project.

What I have to do next is make the fh*() syscalls solid and
investigate the veto locking ideas that have been brought up
in the past.

thanks for you time,
-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Aug 25 10:34:38 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8176515351; Wed, 25 Aug 1999 10:34:30 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.8.8/8.8.8) with ESMTP id KAA16565;
	Wed, 25 Aug 1999 10:49:12 GMT
	(envelope-from bright@wintelcom.net)
Date: Wed, 25 Aug 1999 10:49:12 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: hackers@freebsd.org
Cc: fs@freebsd.org
Subject: second round, vfs and fh* calls.
Message-ID: <Pine.BSF.4.05.9908251046160.6392-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I've done a bit more work on the VFS cleanup part of my diffs.

Unfortunatly I had overlooked some of the filesystems and they
were not compiling cleanly. (ext2fs)

http://big.endian.org/~bright/freebsd/in_progress/vfs-fhsyscall.diff

The fh*() syscalls are still being worked on, but I'd really like
to get this into the tree so my patches don't get stale.

thanks,
-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  8:13: 5 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 47FEC14D83; Thu, 26 Aug 1999 08:13:00 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.8.8/8.8.8) with ESMTP id IAA18732;
	Thu, 26 Aug 1999 08:28:29 GMT
	(envelope-from bright@wintelcom.net)
Date: Thu, 26 Aug 1999 08:28:29 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: hackers@freebsd.org
Cc: fs@freebsd.org, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: HEADS UP Reviewers. VFS changes to be committed.
Message-ID: <Pine.BSF.4.05.9908260220590.6392-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I've posted 2 times asking for someone to review these diffs:

http://big.endian.org/~bright/freebsd/in_progress/vfs-fhsyscall.diff

Am I to take it that silence is accpetance?  I'll be committing this
to -current tonight or tomorrow unless I get feedback.

See attached email for details.

thank you,
-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]


---------- Forwarded message ----------
Date: Tue, 24 Aug 1999 17:24:42 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: fs@freebsd.org
Subject: VFS cleanup and fh*() syscalls almost complete.


I have tested this code lightly, it compiles cleanly and the
NFS server code still seems to work.

http://big.endian.org/~bright/freebsd/in_progress/vfs-fhsyscall.diff

Several things have been done, mostly inspired by NetBSD:

1) VFS_FHTOVP has been split into 2 VFS ops:
   VFS_FHTOVP now only takes a mountpoint, filehandle and **vnode
   VFS_CHECKEXP now takes the export checking arguments that
     VFS_FHTOVP used to take.

2) The casting of VFS ops to eopnotsupp() has been removed and
     vfs_nop*() functions have been put into kern/vfs_default.c

   This makes it more clear that certain VFS-ops are giving default
   behavior, either returning automatic success or returning EOPNOTSUPP.

   Someone mentioned that EOPNOTSUPP should be replaced by EINVAL, can
   that person please speak up?

   Also, some filesystems that actually implemented VFS-ops that did
   nothing or returned EOPNOTSUPP have had those functions deleted
   except when they are essential to understanding the VFS, or
   if it looked like they may eventually be filled in.

3) fh(open|stat|statfs) have been haphazardly implemented, meaning I 
   won't be able to test them until tonight.

Testers? Critics? Comments? please?

If you're wondering why/what I'm doing, it's the kernel support
for a lockd that I'm working on, as well as a cleanup I thought
would make it easier to understand our filesystem code.

I'm sure some people will be wondering: 
Why does VFS_CHECKEXP take a vnode and not a mount point? 
Hopefully in the future a filesystem will be able to more 
restrictively export its files, this will help facilitate that.

Lastly, I'm jumping the gun here and posting a lightly tested
patch because I've had to re-merge this stuff 3 times already and
would like comments on not only functionality, but style as well.
This way I can either commit it, (after more testing) or scrap the
whole project.

What I have to do next is make the fh*() syscalls solid and
investigate the veto locking ideas that have been brought up
in the past.

thanks for you time,
-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  8:26:42 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from axl.noc.iafrica.com (axl.noc.iafrica.com [196.31.1.175])
	by hub.freebsd.org (Postfix) with ESMTP
	id CEEEA15DD9; Thu, 26 Aug 1999 08:26:24 -0700 (PDT)
	(envelope-from sheldonh@axl.noc.iafrica.com)
Received: from sheldonh (helo=axl.noc.iafrica.com)
	by axl.noc.iafrica.com with local-esmtp (Exim 3.02 #1)
	id 11K1Pc-0001ng-00; Thu, 26 Aug 1999 17:26:00 +0200
From: Sheldon Hearn <sheldonh@uunet.co.za>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: hackers@freebsd.org, fs@freebsd.org,
	Michael Hancock <michaelh@cet.co.jp>, David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-reply-to: Your message of "Thu, 26 Aug 1999 08:28:29 GMT."
             <Pine.BSF.4.05.9908260220590.6392-100000@fw.wintelcom.net> 
Date: Thu, 26 Aug 1999 17:26:00 +0200
Message-ID: <6923.935681160@axl.noc.iafrica.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Thu, 26 Aug 1999 08:28:29 GMT, Alfred Perlstein wrote:

> Am I to take it that silence is accpetance?  I'll be committing this
> to -current tonight or tomorrow unless I get feedback.

Recent discussions with bde and eivind indicate that at least some of
the code you're about to touch has one or more maintainers. Kirk
McKusick is probably one of them.

Make sure you contact the maintainers directly before smacking their
code.

Also, I'd suggest that it's a bad idea to say "if I get no feedback
before tonight, I'm committing". I think this applies even if it's not
the first time you've asked for review. Basically, timezones and stuff
make for a situation where such an e-mail is useless for many of your
readers.

Ciao,
Sheldon.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  9:27:55 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 1629615D46
	for <fs@FreeBSD.ORG>; Thu, 26 Aug 1999 09:27:38 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.8.8/8.8.8) with ESMTP id CAA20719;
	Thu, 26 Aug 1999 02:43:21 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Date: Thu, 26 Aug 1999 09:43:21 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: Sheldon Hearn <sheldonh@uunet.co.za>
Cc: fs@FreeBSD.ORG
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-Reply-To: <6923.935681160@axl.noc.iafrica.com>
Message-ID: <Pine.BSF.4.05.9908260919180.6392-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Thu, 26 Aug 1999, Sheldon Hearn wrote:

> 
> 
> On Thu, 26 Aug 1999 08:28:29 GMT, Alfred Perlstein wrote:
> 
> > Am I to take it that silence is accpetance?  I'll be committing this
> > to -current tonight or tomorrow unless I get feedback.
> 
> Recent discussions with bde and eivind indicate that at least some of
> the code you're about to touch has one or more maintainers. Kirk
> McKusick is probably one of them.

I will attempt to contact him.  Afaik he is on vacation and won't be
available for quite some time.  David is the principal architect,
and Mike Smith refered me to Michael Hancock.  I probably should
try phk as well as it seems he's been messing around with vfs lately
as well.

> Make sure you contact the maintainers directly before smacking their
> code.

I have done so in the past (uthread) and will continue to respect the
maintainers.

> Also, I'd suggest that it's a bad idea to say "if I get no feedback
> before tonight, I'm committing". I think this applies even if it's not
> the first time you've asked for review. Basically, timezones and stuff
> make for a situation where such an e-mail is useless for many of your
> readers.

I agree with you on this, I just wanted to get the ball rolling on
these changes.  I've seen that in past silence often means
acceptance, however I wanted to make sure, hence the email.  I 
espcially don't want to re-merge my code, I'm sure you know how
fustrating that gets after the 3rd or 4th time. :)

Just fyi, the changes are just a cleanup imo with the addition
of 3 new syscalls where suser(p) == 0 must be true.

-Alfred

> 
> Ciao,
> Sheldon.
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  9:44: 8 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from jade.chc-chimes.com (jade.chc-chimes.com [216.28.46.6])
	by hub.freebsd.org (Postfix) with ESMTP
	id D03BB14A2E; Thu, 26 Aug 1999 09:44:04 -0700 (PDT)
	(envelope-from billf@jade.chc-chimes.com)
Received: by jade.chc-chimes.com (Postfix, from userid 1001)
	id 871FD1C2E; Thu, 26 Aug 1999 11:45:47 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
	by jade.chc-chimes.com (Postfix) with ESMTP
	id 8293C3815; Thu, 26 Aug 1999 11:45:47 -0400 (EDT)
Date: Thu, 26 Aug 1999 11:45:47 -0400 (EDT)
From: Bill Fumerola <billf@jade.chc-chimes.com>
To: Sheldon Hearn <sheldonh@uunet.co.za>
Cc: Alfred Perlstein <bright@wintelcom.net>, hackers@freebsd.org,
	fs@freebsd.org, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-Reply-To: <6923.935681160@axl.noc.iafrica.com>
Message-ID: <Pine.BSF.4.10.9908261145210.9365-100000@jade.chc-chimes.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, 26 Aug 1999, Sheldon Hearn wrote:

> Also, I'd suggest that it's a bad idea to say "if I get no feedback
> before tonight, I'm committing". I think this applies even if it's not
> the first time you've asked for review. Basically, timezones and stuff
> make for a situation where such an e-mail is useless for many of your
> readers.

This would be post #3 of the same code and changes that no-one has
reponded to.

-- 
- bill fumerola - billf@chc-chimes.com - BF1560 - computer horizons corp -
- ph:(800) 252-2421 - bfumerol@computerhorizons.com - billf@FreeBSD.org  -


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  9:44:42 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from arc.hq.cti.ru (arc.hq.cti.ru [195.34.40.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id A973614DAE; Thu, 26 Aug 1999 09:44:36 -0700 (PDT)
	(envelope-from tejblum@arc.hq.cti.ru)
Received: from arc.hq.cti.ru (localhost [127.0.0.1])
	by arc.hq.cti.ru (8.9.3/8.9.3) with ESMTP id UAA07357;
	Thu, 26 Aug 1999 20:42:17 +0400 (MSD)
	(envelope-from tejblum@arc.hq.cti.ru)
Message-Id: <199908261642.UAA07357@arc.hq.cti.ru>
X-Mailer: exmh version 2.0zeta 7/24/97
To: Alfred Perlstein <bright@wintelcom.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-reply-to: Your message of "Thu, 26 Aug 1999 08:28:29 -0000."
             <Pine.BSF.4.05.9908260220590.6392-100000@fw.wintelcom.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 26 Aug 1999 20:42:16 +0400
From: Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Just a few comments...

> 2) The casting of VFS ops to eopnotsupp() has been removed and
>      vfs_nop*() functions have been put into kern/vfs_default.c
> 
>    This makes it more clear that certain VFS-ops are giving default
>    behavior, either returning automatic success or returning EOPNOTSUPP.

I like the idea. (However, I think that the functions returning failure 
should not be called NOPs.)

> Why does VFS_CHECKEXP take a vnode and not a mount point? 
> Hopefully in the future a filesystem will be able to more 
> restrictively export its files, this will help facilitate that.

IMO, if it take a vnode, it should be VOP_CHECKEXP, not VFS_CHECKEXP.

Dima


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26  9:46:10 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from axl.noc.iafrica.com (axl.noc.iafrica.com [196.31.1.175])
	by hub.freebsd.org (Postfix) with ESMTP
	id DB53B15D95; Thu, 26 Aug 1999 09:46:01 -0700 (PDT)
	(envelope-from sheldonh@axl.noc.iafrica.com)
Received: from sheldonh (helo=axl.noc.iafrica.com)
	by axl.noc.iafrica.com with local-esmtp (Exim 3.02 #1)
	id 11K2eD-000CbI-00; Thu, 26 Aug 1999 18:45:09 +0200
From: Sheldon Hearn <sheldonh@uunet.co.za>
To: Bill Fumerola <billf@jade.chc-chimes.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, hackers@freebsd.org,
	fs@freebsd.org, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-reply-to: Your message of "Thu, 26 Aug 1999 11:45:47 -0400."
             <Pine.BSF.4.10.9908261145210.9365-100000@jade.chc-chimes.com> 
Date: Thu, 26 Aug 1999 18:45:09 +0200
Message-ID: <48439.935685909@axl.noc.iafrica.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Thu, 26 Aug 1999 11:45:47 -0400, Bill Fumerola wrote:

> This would be post #3 of the same code and changes that no-one has
> reponded to.

I hear you, and I was aware of that when I made my comments. Basically,
it's a waste of time saying such a thing, so either be prepared to wait
longer, or don't say it. :-)

Feelings, nothing more than feelings...

Ciao,
Sheldon.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Aug 26 10:30:29 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id D312B14DAE; Thu, 26 Aug 1999 10:30:23 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA23308;
	Thu, 26 Aug 1999 10:27:47 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 26 Aug 1999 10:27:47 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199908261727.KAA23308@apollo.backplane.com>
To: Alfred Perlstein <bright@wintelcom.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG,
	Michael Hancock <michaelh@cet.co.jp>, David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed.
References:  <Pine.BSF.4.05.9908260220590.6392-100000@fw.wintelcom.net>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:I've posted 2 times asking for someone to review these diffs:
:
:http://big.endian.org/~bright/freebsd/in_progress/vfs-fhsyscall.diff
:
:Am I to take it that silence is accpetance?  I'll be committing this
:to -current tonight or tomorrow unless I get feedback.
:
:See attached email for details.
:
:thank you,
:-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
:Wintelcom systems administrator and programmer
:   - http://www.wintelcom.net/ [bright@wintelcom.net]
:
:3) fh(open|stat|statfs) have been haphazardly implemented, meaning I 
:   won't be able to test them until tonight.
:
:Testers? Critics? Comments? please?
:
:If you're wondering why/what I'm doing, it's the kernel support
:for a lockd that I'm working on, as well as a cleanup I thought
:would make it easier to understand our filesystem code.
:
:I'm sure some people will be wondering: 
:Why does VFS_CHECKEXP take a vnode and not a mount point? 
:..
:-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]

    I've done a quick once-over of your patch.  From the point of view of
    the work I'm doing and the work Kirk will be doing later on, I do
    not think the patch will cause any problems since you are adding new
    VOPs for the most part rather then modifying (too many) existing VOPs.

    Most of the work that Kirk and I will be doing will concentrate on
    namei, locking, and I/O, which you mostly avoid in your patch.

    In general I like the idea of implementing reasonable defaults.

    I would ask two things though:

	* First, please add comprehensive /* */ comments in front of each 
	  vfsnop_*() procedure explaining what it does, why it returns what
	  it returns, locking requirements (if any) on entry, and side effects
	  on return.  This is just for readability.

	  Do the same for all the procedures you are adding, in fact.

	* I think you can safely commit any elements that are not used by
	  existing builds since they are not likely to impact existing
	  builds operationally.

	  Then see what you have left over.  If it is not significant, commit
	  that to.  If it is significant, do more comprehensive testing on
	  what you have left over (i.e. that impacts existing builds) and
	  ask for another review after testing, before committing it.

    A working lock daemon would be totally awesome!  The fh*() routines
    you are adding are roughly what you (and we) need to make progress in 
    this area!

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Aug 27 13:21:57 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 836D81557C; Fri, 27 Aug 1999 13:21:35 -0700 (PDT)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id QAA02879;
	Fri, 27 Aug 1999 16:18:47 -0400 (EDT)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id QAA22373;
	Fri, 27 Aug 1999 16:18:45 -0400 (EDT)
Date: Fri, 27 Aug 1999 16:18:45 -0400 (EDT)
Message-Id: <199908272018.QAA22373@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-reply-to: Your message of "Thu, 26 Aug 1999 10:27:47 PDT."
             <199908261727.KAA23308@apollo.backplane.com> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <199908261727.KAA23308@apollo.backplane.com>, Matthew Dillon writes:
[...]
>     I would ask two things though:
> 
> 	* First, please add comprehensive /* */ comments in front of each 
> 	  vfsnop_*() procedure explaining what it does, why it returns what
> 	  it returns, locking requirements (if any) on entry, and side effects
> 	  on return.  This is just for readability.
> 
> 	  Do the same for all the procedures you are adding, in fact.

Moreover, I would strongly recommend xplicitly documenting the following:

- which function args are in-args and which are out-args?

- does the function takes any allocated memory that it is expected to free?

- is the function expected to allocate any memory objects that have to be
  freed elsewhere?

- does the function increase or decrease any reference counts of any objects?
  Is it expected to?

These and other requirements are essentially the "interface" between the VFS
and lower-level file systems.  Figuring out this stuff on every OS and OS
revision (esp. when the VFS changes so often---linux) was the longest most
frustrating task I faced when writing my Wrapfs stackable f/s module for
solaris, freebsd, and linux.  I wish documentation had been in place.

> 	* I think you can safely commit any elements that are not used by
> 	  existing builds since they are not likely to impact existing
> 	  builds operationally.
> 
> 	  Then see what you have left over.  If it is not significant, commit
> 	  that to.  If it is significant, do more comprehensive testing on
> 	  what you have left over (i.e. that impacts existing builds) and
> 	  ask for another review after testing, before committing it.

Erez.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Aug 27 13:35: 0 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.40.131])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7DAB816064; Fri, 27 Aug 1999 13:34:48 -0700 (PDT)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.9.3/8.9.2) with ESMTP id WAA06795;
	Fri, 27 Aug 1999 22:32:27 +0200 (CEST)
	(envelope-from phk@critter.freebsd.dk)
To: Erez Zadok <ezk@cs.columbia.edu>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	Alfred Perlstein <bright@wintelcom.net>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-reply-to: Your message of "Fri, 27 Aug 1999 16:18:45 EDT."
             <199908272018.QAA22373@shekel.mcl.cs.columbia.edu> 
Date: Fri, 27 Aug 1999 22:32:27 +0200
Message-ID: <6793.935785947@critter.freebsd.dk>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Uhm, have any of you actually ever looked at src/sys/kern/vnode_if.src ?

Poul-Henning

In message <199908272018.QAA22373@shekel.mcl.cs.columbia.edu>, Erez Zadok write
s:
>In message <199908261727.KAA23308@apollo.backplane.com>, Matthew Dillon writes:
>[...]
>>     I would ask two things though:
>> 
>> 	* First, please add comprehensive /* */ comments in front of each 
>> 	  vfsnop_*() procedure explaining what it does, why it returns what
>> 	  it returns, locking requirements (if any) on entry, and side effects
>> 	  on return.  This is just for readability.
>> 
>> 	  Do the same for all the procedures you are adding, in fact.
>
>Moreover, I would strongly recommend xplicitly documenting the following:
>
>- which function args are in-args and which are out-args?
>
>- does the function takes any allocated memory that it is expected to free?
>
>- is the function expected to allocate any memory objects that have to be
>  freed elsewhere?
>
>- does the function increase or decrease any reference counts of any objects?
>  Is it expected to?
>
>These and other requirements are essentially the "interface" between the VFS
>and lower-level file systems.  Figuring out this stuff on every OS and OS
>revision (esp. when the VFS changes so often---linux) was the longest most
>frustrating task I faced when writing my Wrapfs stackable f/s module for
>solaris, freebsd, and linux.  I wish documentation had been in place.
>
>> 	* I think you can safely commit any elements that are not used by
>> 	  existing builds since they are not likely to impact existing
>> 	  builds operationally.
>> 
>> 	  Then see what you have left over.  If it is not significant, commit
>> 	  that to.  If it is significant, do more comprehensive testing on
>> 	  what you have left over (i.e. that impacts existing builds) and
>> 	  ask for another review after testing, before committing it.
>
>Erez.
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-fs" in the body of the message
>

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Aug 28 11:20:43 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 1867F14EE0
	for <freebsd-fs@freebsd.org>; Sat, 28 Aug 1999 11:20:39 -0700 (PDT)
	(envelope-from ticso@cicely8.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id UAA20751
	for <freebsd-fs@freebsd.org>; Sat, 28 Aug 1999 20:19:30 +0200 (MET DST)
Received: from cicely8.cicely.de (cicely8.cicely.de [10.1.2.10])
	by mail.cicely.de (8.9.0/8.9.0) with ESMTP id UAA00825
	for <freebsd-fs@freebsd.org>; Sat, 28 Aug 1999 20:16:03 +0200 (CEST)
Received: (from ticso@localhost)
	by cicely8.cicely.de (8.9.3/8.9.2) id UAA27736
	for freebsd-fs@freebsd.org; Sat, 28 Aug 1999 20:17:34 +0200 (CEST)
	(envelope-from ticso)
Date: Sat, 28 Aug 1999 20:17:23 +0200
From: Bernd Walter <ticso@cicely.de>
To: freebsd-fs@freebsd.org
Subject: fs-locking and fs memory copies questions
Message-ID: <19990828201723.A27704@cicely8.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.3i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

I would like to grow a ufs filesystem without the need to umount it before.
The growing itself will be possible for me in the near future without any
problems.

Before doing it while the fs is mounted I need to flush all blocks back to
the partition and lock the fs until the fs-strucktures are all updated.
All programs should block during the lock.
Is it possible to flush relieable with softupdates?

After the strucktures are up to date I need to reread the superblock and
all in memory copies of the previously last cg and all derived variables.

In some cases it is needed that I reallocate some frags/blocks.
That means that cg 0 has changed and maybe some other cgs I used to move
the frags to.
That also means that at least one inode or indirection block have beend modified
and of course that in memory references to diskblocks of indirection and/or
datablocks need to be updated.

Finally the locks need to be removed and all waiting operations can continue.


I would be happy if someone can explain or point me to a good documentation
about internal operations regarding the points I need.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Aug 28 15:50:16 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id 5A57E14C3C
	for <freebsd-fs@FreeBSD.ORG>; Sat, 28 Aug 1999 15:50:08 -0700 (PDT)
	(envelope-from tlambert@usr01.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id PAA10531;
	Sat, 28 Aug 1999 15:49:48 -0700 (MST)
Received: from usr01.primenet.com(206.165.6.201)
 via SMTP by smtp03.primenet.com, id smtpdAAADEaaJu; Sat Aug 28 15:49:39 1999
Received: (from tlambert@localhost)
	by usr01.primenet.com (8.8.5/8.8.5) id PAA06168;
	Sat, 28 Aug 1999 15:49:51 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199908282249.PAA06168@usr01.primenet.com>
Subject: Re: fs-locking and fs memory copies questions
To: ticso@cicely.de (Bernd Walter)
Date: Sat, 28 Aug 1999 22:49:51 +0000 (GMT)
Cc: freebsd-fs@FreeBSD.ORG
In-Reply-To: <19990828201723.A27704@cicely8.cicely.de> from "Bernd Walter" at Aug 28, 99 08:17:23 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> I would like to grow a ufs filesystem without the need to umount it before.
> The growing itself will be possible for me in the near future without any
> problems.
> 
> Before doing it while the fs is mounted I need to flush all blocks back to
> the partition and lock the fs until the fs-strucktures are all updated.
> All programs should block during the lock.
> Is it possible to flush relieable with softupdates?

running "mount -u" on the FS in question does this.


> After the strucktures are up to date I need to reread the superblock and
> all in memory copies of the previously last cg and all derived variables.

You will need to modify code to do this.  The "mount -u" code doesn't
cause the in core data to be re-read, and doesn't sync out data to a
"clean" state.

The code you need to modify is in ffs_mount in ffs_vfops.c.


> In some cases it is needed that I reallocate some frags/blocks.

This is the problem with growing FS's: the hash fill on the
preexisting cylinger groups will be higher than on the new
cylinder groups, leading to fragmentation.

There are two ways around this, one good, one bad.  I expect you'll
want to use the "bad" way:

Good:	Defrag the drive by either writing and then running a
	defragger, or by backing up and restoring files.  You
	could copy the files in place, if you had enough disk,
	and used a holey copy program (e.g. GNU tar).

Bad:	Give preference to the new CG's for all allocations;
	the problem with this is, you never know when to stop
	doing this.  8-(.

> That means that cg 0 has changed and maybe some other cgs I used
> to move the frags to.  That also means that at least one inode
> or indirection block have beend modified and of course that in
> memory references to diskblocks of indirection and/or
> datablocks need to be updated.
> Finally the locks need to be removed and all waiting operations can continue.

I think you will find an "on the fly" defragger a difficult thing
to write.

> I would be happy if someone can explain or point me to a good documentation
> about internal operations regarding the points I need.


The above will get you started.

A generic defragger would be a good think to have, if you wanted
to allow shrinking partitions, too.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Aug 28 16:40:27 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 033B214D8C
	for <freebsd-fs@FreeBSD.ORG>; Sat, 28 Aug 1999 16:40:24 -0700 (PDT)
	(envelope-from ticso@cicely8.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id BAA04443;
	Sun, 29 Aug 1999 01:37:21 +0200 (MET DST)
Received: from cicely8.cicely.de (cicely8.cicely.de [10.1.2.10])
	by mail.cicely.de (8.9.0/8.9.0) with ESMTP id BAA01607;
	Sun, 29 Aug 1999 01:37:18 +0200 (CEST)
Received: (from ticso@localhost)
	by cicely8.cicely.de (8.9.3/8.9.2) id BAA28142;
	Sun, 29 Aug 1999 01:36:57 +0200 (CEST)
	(envelope-from ticso)
Date: Sun, 29 Aug 1999 01:36:56 +0200
From: Bernd Walter <ticso@cicely.de>
To: Terry Lambert <tlambert@primenet.com>
Cc: Bernd Walter <ticso@cicely.de>, freebsd-fs@FreeBSD.ORG
Subject: Re: fs-locking and fs memory copies questions
Message-ID: <19990829013655.E27811@cicely8.cicely.de>
References: <19990828201723.A27704@cicely8.cicely.de> <199908282249.PAA06168@usr01.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.3i
In-Reply-To: <199908282249.PAA06168@usr01.primenet.com>; from Terry Lambert on Sat, Aug 28, 1999 at 10:49:51PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Aug 28, 1999 at 10:49:51PM +0000, Terry Lambert wrote:
> This is the problem with growing FS's: the hash fill on the
> preexisting cylinger groups will be higher than on the new
> cylinder groups, leading to fragmentation.

Mmmmm - I can't follow you on this.
The existing cg's are prefilled with old files.
The new ones are empty after growing.
I beleaved ffs would prever the new ones automaticaly because of
the super-blocks summary information.
Guess I need to look more deeply at the block-searching routines.
At least the problems created with that should loose during usage.

> 
> A generic defragger would be a good think to have, if you wanted
> to allow shrinking partitions, too.
> 
I already need to move blocks around in case the superblock-summary
information needs another frag.
It was one of the more difficult things to do there are still some erros left
about this - the rest was quite easy.
Shrinking is something I don't beleave to get working properly, because
that would mean loosing cg's with all their inodes.
Moving files to different inodes is generaly a mess for NFS-servers.
Another point is that finding the reference for a frag is a real expensive thing
to do :(

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Aug 28 20: 9:25 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id 55C4914F7A; Sat, 28 Aug 1999 20:09:21 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.8.8/8.8.8) with ESMTP id NAA17700;
	Sat, 28 Aug 1999 13:25:34 -0700 (PDT)
	(envelope-from bright@wintelcom.net)
Date: Sat, 28 Aug 1999 20:25:34 +0000 (GMT)
From: Alfred Perlstein <bright@wintelcom.net>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Erez Zadok <ezk@cs.columbia.edu>,
	Matthew Dillon <dillon@apollo.backplane.com>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG, Michael Hancock <michaelh@cet.co.jp>,
	David Greenman <dg@root.com>
Subject: Re: HEADS UP Reviewers. VFS changes to be committed. 
In-Reply-To: <6793.935785947@critter.freebsd.dk>
Message-ID: <Pine.BSF.4.05.9908282023450.6392-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Fri, 27 Aug 1999, Poul-Henning Kamp wrote:

> 
> Uhm, have any of you actually ever looked at src/sys/kern/vnode_if.src ?

I can't really tell if you are commenting on the diffs I provided or
if you are commmenting on the comments I have recieved, or both.

Either way, could you elaborate a bit?  I was hoping for your input on
this issue.

thank you,
-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]


> 
> Poul-Henning
> 
> In message <199908272018.QAA22373@shekel.mcl.cs.columbia.edu>, Erez Zadok write
> s:
> >In message <199908261727.KAA23308@apollo.backplane.com>, Matthew Dillon writes:
> >[...]
> >>     I would ask two things though:
> >> 
> >> 	* First, please add comprehensive /* */ comments in front of each 
> >> 	  vfsnop_*() procedure explaining what it does, why it returns what
> >> 	  it returns, locking requirements (if any) on entry, and side effects
> >> 	  on return.  This is just for readability.
> >> 
> >> 	  Do the same for all the procedures you are adding, in fact.
> >
> >Moreover, I would strongly recommend xplicitly documenting the following:
> >
> >- which function args are in-args and which are out-args?
> >
> >- does the function takes any allocated memory that it is expected to free?
> >
> >- is the function expected to allocate any memory objects that have to be
> >  freed elsewhere?
> >
> >- does the function increase or decrease any reference counts of any objects?
> >  Is it expected to?
> >
> >These and other requirements are essentially the "interface" between the VFS
> >and lower-level file systems.  Figuring out this stuff on every OS and OS
> >revision (esp. when the VFS changes so often---linux) was the longest most
> >frustrating task I faced when writing my Wrapfs stackable f/s module for
> >solaris, freebsd, and linux.  I wish documentation had been in place.
> >
> >> 	* I think you can safely commit any elements that are not used by
> >> 	  existing builds since they are not likely to impact existing
> >> 	  builds operationally.
> >> 
> >> 	  Then see what you have left over.  If it is not significant, commit
> >> 	  that to.  If it is significant, do more comprehensive testing on
> >> 	  what you have left over (i.e. that impacts existing builds) and
> >> 	  ask for another review after testing, before committing it.
> >


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message