From owner-freebsd-current  Mon Jan 26 21:41:21 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id VAA11601
          for current-outgoing; Mon, 26 Jan 1998 21:41:21 -0800 (PST)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id VAA11596;
          Mon, 26 Jan 1998 21:41:17 -0800 (PST)
          (envelope-from tlambert@usr01.primenet.com)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.8.8/8.8.8) id WAA06946;
	Mon, 26 Jan 1998 22:41:16 -0700 (MST)
Received: from usr01.primenet.com(206.165.6.201)
 via SMTP by smtp04.primenet.com, id smtpd006916; Mon Jan 26 22:41:12 1998
Received: (from tlambert@localhost)
	by usr01.primenet.com (8.8.5/8.8.5) id WAA00264;
	Mon, 26 Jan 1998 22:41:08 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199801270541.WAA00264@usr01.primenet.com>
Subject: Re: stable current?
To: dyson@FreeBSD.ORG
Date: Tue, 27 Jan 1998 05:41:08 +0000 (GMT)
Cc: tlambert@primenet.com, scottm@cs.ucla.edu, freebsd-current@FreeBSD.ORG
In-Reply-To: <199801260248.VAA00479@dyson.iquest.net> from "John S. Dyson" at Jan 25, 98 09:48:29 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk

> > Well, you are right, and the code supports it.
>
> Following up my own comments:

I'll follow up the other comments tomorrow, when I have time to grovel
the code at the same time I am typing.  This will, in brief, point to
the specific issue I'm referencing inre: stacking.


> The pager code has supported whatever the filesystems do.  This becomes
> more of an issue with the layered filesystems, and the major impediment
> that we have had is that the VM object could not be shared across vnodes,
> and now it can.  (Theoretically, we could layer UFS onto other filesystems,
> but I am not worried about that right now.)

The existance of aliases (ie: the act of sharing the object across
vnodes) is fraught with peril.  You may, in fact, have corrected this
in the most recent code; I haven't updated my experimental system
for some time.

The aliasing of objects is precisely what was causing the "nullfs is
not very null" series of changes to be a *bad* idea.


The problem here is that if an FS does not support a VOP_{GET|PUT}PAGES,
you can't tell whether it's because the FS truly does not understand it,
or if you should use the bypass.  Until all physical media FS's natively
support it (I'm willing to hack on this if you are willing to commit the
code), then you can't know that a NULL descriptor value in, say, nullfs
or unionfs, means that the FS itself can't handle it, and you need to
back off to the algorithm with promiscuous knowledge of where the pages
are hung off the vnode.  This algorithm assumes the top level vnode; for
any stack, this is guaranteed to be an alias.  Aliases are bad because
they are hard to get right.  Again, you might have gotten them right;
I don't know.  Really, you want to be able to ask "does the vnode
backing this object support these interfaces?".  You can't really ask
that right now.

Once you can ask that, maybe I screwed up in recommeneding the use
of VOP_{GET|PUT}PAGES; if you can ask for the backing object, you
can then run the old vnode_pager on the backing object instead of
on the top level vnode.  You can also access your lock list through
the backing vnode's lock list (if the advisory locks were hung there),
so maybe this is a better approach.  I don't know yet.

But aliases are a problem because of downward synchronization; even if
you've taken care of it, it's a problem becauses of unnecessary code
bloat, such as was contemplated in nullfs and unionfs.


> Sure there are some layering issues as you have suggested, but I have
> also known about them.  There has been alot more wrong than even you
> have ever talked about (at least to me.)

Well, come back to town and let me buy you a pizza and talk your ear off...
;-).

I never said that I knew all of the answers; only some of the questions,
and what I thought were some (a smaller number) of the answers.


> If this was an easy thing to fix, I would have a long time ago.  There
> are few, if any revelations in what you have said, but damn it, it doesn't
> hurt to work in parallel, until I hear complaints that don't solve them,
> or someone who discounts the complexity of problems.
> 
> Almost none of the problems that we have had are due to the conventional
> filesystem implementations.  I can write a native getpages or putpages in
> a few hours.  It is the layering, and the vnode pager is pretty much not
> an issue, and hasn't been for the last 2yrs.  If the vnode pager implements
> a default getpages or putpages interface, who cares?  The layered filesystems
> will supply their own.

The intermediate layers will *not* necessarily supply their own.  This
is the purpose of the bypass.  But to do a bypass, you have to be able
to be sure that there's something at the end of the detour before you
go down the road, and find nothing but a "sever tire damage will result!"
sign.  8-).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.