From owner-freebsd-arch  Sun Oct  7 22:43:49 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by hub.freebsd.org (Postfix) with ESMTP id 00D0237B405
	for <freebsd-arch@FreeBSD.ORG>; Sun,  7 Oct 2001 22:43:44 -0700 (PDT)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.6/8.11.6) with ESMTP id f985hBO95928;
	Mon, 8 Oct 2001 07:43:37 +0200 (CEST)
	(envelope-from phk@critter.freebsd.dk)
To: Peter Wemm <peter@wemm.org>
Cc: Brian Somers <brian@Awfulhak.org>, freebsd-arch@FreeBSD.ORG
Subject: Re: Cloned open support 
In-Reply-To: Your message of "Sun, 07 Oct 2001 15:17:05 PDT."
             <20011007221706.1ABAE3809@overcee.netplex.com.au> 
Date: Mon, 08 Oct 2001 07:43:11 +0200
Message-ID: <95926.1002519791@critter.freebsd.dk>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

In message <20011007221706.1ABAE3809@overcee.netplex.com.au>, Peter Wemm writes
:
>Poul-Henning Kamp wrote:
>> 
>> Uhm, Brian...
>> 
>> We have cloned devices already...
>> 
>> What exactly is it that you want to implement ?
>> 
>> Poul-Henning
>
>Devfs name cloning feels hackish to me.  Having a seperate EVENTHANDLER()
>for doing it feels .. just nasty.  I'd much rather that we had a d_clone
>devsw entry and/or a D_CLONE d_flags entry.

Call it hackish, I call it elegant:  

    * I didn't have to modify all the device drivers making them
      incompatible with anything anybody ever learned.

    * I didn't have to do long-haired vnode operations in cloning
      drivers, thus preserving the ability to do systematic SMP
      lock-boundaries at the cdevsw-> level.

    * It supports parameterized clone opens (ie: not just "/dev/pty",
      but also "/dev/ad0s1g" and even if somebody implemented it: 
      "/dev/ccd,mirror,ad0s1f,ad1s1f" :-)

    * It is a *LOT* simpler than doing it by vnodes...

The story is that by the time you reach devsw->open() you have
committed the vnode and if you change the device at that time you
need to unwind all the way back up to the association of the vnode
with the dev_t, and wind all the way back down before the open can
progress.

(The fact that it is an EVENTHANDLER is just a matter of implementation,
I didn't see a point to reimplementing the same functionality when
EVENTHANDLERS already were available).

>There are two types of cloning. One is to map some name "/dev/fd0135ds18h"
>into a device node without having to flood /dev with all possible
>permutations.  The other is to support per-device "select next unit" style
>opens.  Presently these are both kludged into the EVENTHANDLER interface.

Those two are actually the same kind of open Peter, semantically
they both say "make me a device according to this wish: ``...'' and let
me open it".

>I think Brian wants to move the second part directly into the open handler
>like it is done on most other OS's that support cloning.  Personally,
>I would be quite happy if we could do that.

Most other devices have made a mess of their vnodes and drivers by
doing so :-(

The FreeBSD implementation completely sidesteps all the vnode hair
by doing the cloning at namei() time instead of open time, this
makes it much simpler and much more capable.

If you do a vnode based cloning, it will not support your
"/dev/fd0135ds18h" example above, unless you flood /dev with all
possible entries.

>I realize why it is done the way it is done now though.  VOP_LOOKUP()
>having to return a unique vnode for the device is a pain.  (which is why
>the clone is done during lookup, so that the correct vnode is found and
>available).  But understanding why doesn't mean that I dont wish that it
>could be different. :-)

Well, if devices lived at the filedescriptor level instead of at the vnode
level, things would be different (but I havn't tried to implement that
so I can't say for sure if it would actually be "better"...)

>Doug mentions the hack in dev/streams/streams.c:
>        td->td_dupfd = fd;
>        return ENXIO;
>.. this is nasty. :-)

This is abuse, it should be rewritten.

>I think the SVR4 clone driver uses something like this.  It causes the
>original namei / open attempt to fail (thus releasing the "common" vnode)
>and then switching over to the *real* file/vnode at the last minute.

We would have to do that as well in order to unwind the committed vnode
and select another.


I would like to request that nobody starts to commit a vnode based cloning
(or API changes for it) until they actually have a working prototype.
I've been there, done that and threw it away.

The only reason I can see for adding vnode-based cloning would be if
somebody can point out something they cannot do with namei-based
cloning...

<SHAMELESS PLUG>
My BSDCONey and BSDCON talks would be very good places to ask questions
about this :-)
</SHAMELESS PLUG>

Poul-Henning
-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message