From owner-freebsd-hackers  Wed Apr 18 18:58:40 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id B5A3A37B43C; Wed, 18 Apr 2001 18:58:30 -0700 (PDT)
	(envelope-from bp@butya.kz)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id 1E447287A6; Thu, 19 Apr 2001 08:38:16 +0700 (ALMST)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id BD03028769; Thu, 19 Apr 2001 08:38:16 +0700 (ALMST)
Date: Thu, 19 Apr 2001 08:38:16 +0700 (ALMST)
From: Boris Popov <bp@butya.kz>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	Robert Watson <rwatson@FreeBSD.ORG>,
	Kirk McKusick <mckusick@mckusick.com>,
	Julian Elischer <julian@elischer.org>,
	Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG,
	David Xu <bsddiy@21cn.com>
Subject: Re: vm balance 
In-Reply-To: <40677.987614127@critter>
Message-ID: <Pine.BSF.4.21.0104190755440.27694-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 18 Apr 2001, Poul-Henning Kamp wrote:

> In message <200104181702.f3IH24s23282@earth.backplane.com>, Matt Dillon writes:
> >    If this will get rid of or clean up the specfs garbage, then I'm all
> >    for it.  I would love to see a 'clean' fileops based device interface.
> 
> specfs, aliased vnodes, you name it...
> 
> I think the aliased vnodes is the single most strong argument of them
> all for doing this...

	I think that this can be (and already is) solved in the other
way. Here is how I done it on my test system (quoted from the mail to
Bruce Evans):

--quote-start--
        I'm working on this problem too, and these vop_lock/unlock in the
spec_open/read/write vnops cause a real pain. Using a generic vnode
stacking/layering mechanism (diffs will be published soon) I've
reorganized the way how device vnodes are handled. Each device gets its
own vnode of type VT_SPEC which is belongs to a hidden specfs mount. When
any real filesystem tries to lookup vnode for a specific device via
addaliasu(), addalias() just stacks filesystem vnode over specfs vnode:

        fs1/vnode1     fs1/vnode8           fs2/vnode1
                |       |                       |
                +-------+-----------------------+
                        |
                        V
                   specfs vnode

        Specfs vnode also can be used directly as root vnode for any
mounted filesystem. Obviously, there is no need in the device aliases
because device can be controlled only via single vnode. v_rdev field is
also goes away from vnode structure and vn_todev() is the right way to get
a pointer to underlying device.

        But there is a real problem with a locking/unlocking used by
specfs. Eg, if specfs vnode's lock used as lock for an entire layer tree,
then things will be totally broken because blocked spec_read() operation
may unlock a different vnode which should be locked, and even more
problems caused that the read lock is shared... Use of separate lock for
each vnode partially solves the problem, but not completely emulates the
old behavior for exclusive lock on open operation. For example if we call
open(vn1) and it block, the second open(vn1) will stuck waiting for lock
on vn1, while open(vn8) will work just fine.

        This problem is common for stacked filesystems and many papers
avoid talking about it. The "right" solution is to have a "call stack", so
an unlock operation can unlock only a single chain of the above vnodes,
but I'm don't see the simple way to implement it for stacks containing
more than two layers :(
--quote-end--

	Now, regarding to the new file operations structure: it is pretty
obvious that most of the operations will resemble vnode operations.
However, it is a misdesign of VFS to not allow a filesystem to track a
per-file descriptor tracking for at least OPEN/CLOSE operations. It is
also a pretty obvious that file operations (FOP) are just a layer above
VOP operations.

	So, why not to do things right and add capability to the existing
VFS to handle a per-file operations properly ? Of course, this will
require more brain work, but results will be definitely better.

	Lets back to vnode/vm/file/devices: I think it is a mistake to rip
out vnodes from devices. But I'm agree that vnode structure is too fat to
be used in the more general way. If it is possible to cleanup it, then we
can easily build any hierarchies we want:

	file1	file2	file3
	|	|	|
	+-------+	|
	|		|
	vnode1		vnode2
	|		|
	+---------------+
	|
	device1

--
Boris Popov
http://www.butya.kz/~bp/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message