From owner-freebsd-hackers Wed Apr 18 18:58:40 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id B5A3A37B43C; Wed, 18 Apr 2001 18:58:30 -0700 (PDT) (envelope-from bp@butya.kz) Received: by relay.butya.kz (Postfix, from userid 1000) id 1E447287A6; Thu, 19 Apr 2001 08:38:16 +0700 (ALMST) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id BD03028769; Thu, 19 Apr 2001 08:38:16 +0700 (ALMST) Date: Thu, 19 Apr 2001 08:38:16 +0700 (ALMST) From: Boris Popov To: Poul-Henning Kamp Cc: Matt Dillon , Robert Watson , Kirk McKusick , Julian Elischer , Rik van Riel , freebsd-hackers@FreeBSD.ORG, David Xu Subject: Re: vm balance In-Reply-To: <40677.987614127@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 18 Apr 2001, Poul-Henning Kamp wrote: > In message <200104181702.f3IH24s23282@earth.backplane.com>, Matt Dillon writes: > > If this will get rid of or clean up the specfs garbage, then I'm all > > for it. I would love to see a 'clean' fileops based device interface. > > specfs, aliased vnodes, you name it... > > I think the aliased vnodes is the single most strong argument of them > all for doing this... I think that this can be (and already is) solved in the other way. Here is how I done it on my test system (quoted from the mail to Bruce Evans): --quote-start-- I'm working on this problem too, and these vop_lock/unlock in the spec_open/read/write vnops cause a real pain. Using a generic vnode stacking/layering mechanism (diffs will be published soon) I've reorganized the way how device vnodes are handled. Each device gets its own vnode of type VT_SPEC which is belongs to a hidden specfs mount. When any real filesystem tries to lookup vnode for a specific device via addaliasu(), addalias() just stacks filesystem vnode over specfs vnode: fs1/vnode1 fs1/vnode8 fs2/vnode1 | | | +-------+-----------------------+ | V specfs vnode Specfs vnode also can be used directly as root vnode for any mounted filesystem. Obviously, there is no need in the device aliases because device can be controlled only via single vnode. v_rdev field is also goes away from vnode structure and vn_todev() is the right way to get a pointer to underlying device. But there is a real problem with a locking/unlocking used by specfs. Eg, if specfs vnode's lock used as lock for an entire layer tree, then things will be totally broken because blocked spec_read() operation may unlock a different vnode which should be locked, and even more problems caused that the read lock is shared... Use of separate lock for each vnode partially solves the problem, but not completely emulates the old behavior for exclusive lock on open operation. For example if we call open(vn1) and it block, the second open(vn1) will stuck waiting for lock on vn1, while open(vn8) will work just fine. This problem is common for stacked filesystems and many papers avoid talking about it. The "right" solution is to have a "call stack", so an unlock operation can unlock only a single chain of the above vnodes, but I'm don't see the simple way to implement it for stacks containing more than two layers :( --quote-end-- Now, regarding to the new file operations structure: it is pretty obvious that most of the operations will resemble vnode operations. However, it is a misdesign of VFS to not allow a filesystem to track a per-file descriptor tracking for at least OPEN/CLOSE operations. It is also a pretty obvious that file operations (FOP) are just a layer above VOP operations. So, why not to do things right and add capability to the existing VFS to handle a per-file operations properly ? Of course, this will require more brain work, but results will be definitely better. Lets back to vnode/vm/file/devices: I think it is a mistake to rip out vnodes from devices. But I'm agree that vnode structure is too fat to be used in the more general way. If it is possible to cleanup it, then we can easily build any hierarchies we want: file1 file2 file3 | | | +-------+ | | | vnode1 vnode2 | | +---------------+ | device1 -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message