From owner-freebsd-arch  Mon Oct  8  1:33:53 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from Awfulhak.org (gw.Awfulhak.org [217.204.245.18])
	by hub.freebsd.org (Postfix) with ESMTP id 4DBFE37B403
	for <freebsd-arch@FreeBSD.ORG>; Mon,  8 Oct 2001 01:33:43 -0700 (PDT)
Received: from hak.lan.Awfulhak.org (root@hak.lan.Awfulhak.org [172.16.0.12])
	by Awfulhak.org (8.11.6/8.11.6) with ESMTP id f988XbJ20011;
	Mon, 8 Oct 2001 09:33:37 +0100 (BST)
	(envelope-from brian@freebsd-services.com)
Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1])
	by hak.lan.Awfulhak.org (8.11.6/8.11.6) with ESMTP id f988UxT40831;
	Mon, 8 Oct 2001 09:30:59 +0100 (BST)
	(envelope-from brian@freebsd-services.com)
Message-Id: <200110080830.f988UxT40831@hak.lan.Awfulhak.org>
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Peter Wemm <peter@wemm.org>, Brian Somers <brian@Awfulhak.org>,
	freebsd-arch@FreeBSD.ORG, brian@freebsd-services.com
Subject: Re: Cloned open support 
In-Reply-To: Message from Poul-Henning Kamp <phk@critter.freebsd.dk> 
   of "Mon, 08 Oct 2001 07:43:11 +0200." <95926.1002519791@critter.freebsd.dk> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 08 Oct 2001 09:30:59 +0100
From: Brian Somers <brian@freebsd-services.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

Hi,

First, let me mention that I think your response is a bit odd.... 
timing wise.  I posted the message that you've responded to on 
January 29 and you responded on October 7.  I implemented cloning on 
if_tun in May... :*P

Maybe my mail server is under performing !

> In message <20011007221706.1ABAE3809@overcee.netplex.com.au>, Peter Wemm writes
> :
> >Poul-Henning Kamp wrote:
> >> 
> >> Uhm, Brian...
> >> 
> >> We have cloned devices already...
> >> 
> >> What exactly is it that you want to implement ?
> >> 
> >> Poul-Henning
> >
> >Devfs name cloning feels hackish to me.  Having a seperate EVENTHANDLER()
> >for doing it feels .. just nasty.  I'd much rather that we had a d_clone
> >devsw entry and/or a D_CLONE d_flags entry.
> 
> Call it hackish, I call it elegant:  
> 
>     * I didn't have to modify all the device drivers making them
>       incompatible with anything anybody ever learned.
> 
>     * I didn't have to do long-haired vnode operations in cloning
>       drivers, thus preserving the ability to do systematic SMP
>       lock-boundaries at the cdevsw-> level.
> 
>     * It supports parameterized clone opens (ie: not just "/dev/pty",
>       but also "/dev/ad0s1g" and even if somebody implemented it: 
>       "/dev/ccd,mirror,ad0s1f,ad1s1f" :-)
> 
>     * It is a *LOT* simpler than doing it by vnodes...
> 
> The story is that by the time you reach devsw->open() you have
> committed the vnode and if you change the device at that time you
> need to unwind all the way back up to the association of the vnode
> with the dev_t, and wind all the way back down before the open can
> progress.
> 
> (The fact that it is an EVENTHANDLER is just a matter of implementation,
> I didn't see a point to reimplementing the same functionality when
> EVENTHANDLERS already were available).
> 
> >There are two types of cloning. One is to map some name "/dev/fd0135ds18h"
> >into a device node without having to flood /dev with all possible
> >permutations.  The other is to support per-device "select next unit" style
> >opens.  Presently these are both kludged into the EVENTHANDLER interface.
> 
> Those two are actually the same kind of open Peter, semantically
> they both say "make me a device according to this wish: ``...'' and let
> me open it".
> 
> >I think Brian wants to move the second part directly into the open handler
> >like it is done on most other OS's that support cloning.  Personally,
> >I would be quite happy if we could do that.
> 
> Most other devices have made a mess of their vnodes and drivers by
> doing so :-(
> 
> The FreeBSD implementation completely sidesteps all the vnode hair
> by doing the cloning at namei() time instead of open time, this
> makes it much simpler and much more capable.
> 
> If you do a vnode based cloning, it will not support your
> "/dev/fd0135ds18h" example above, unless you flood /dev with all
> possible entries.
> 
> >I realize why it is done the way it is done now though.  VOP_LOOKUP()
> >having to return a unique vnode for the device is a pain.  (which is why
> >the clone is done during lookup, so that the correct vnode is found and
> >available).  But understanding why doesn't mean that I dont wish that it
> >could be different. :-)
> 
> Well, if devices lived at the filedescriptor level instead of at the vnode
> level, things would be different (but I havn't tried to implement that
> so I can't say for sure if it would actually be "better"...)
> 
> >Doug mentions the hack in dev/streams/streams.c:
> >        td->td_dupfd = fd;
> >        return ENXIO;
> >.. this is nasty. :-)
> 
> This is abuse, it should be rewritten.
> 
> >I think the SVR4 clone driver uses something like this.  It causes the
> >original namei / open attempt to fail (thus releasing the "common" vnode)
> >and then switching over to the *real* file/vnode at the last minute.
> 
> We would have to do that as well in order to unwind the committed vnode
> and select another.
> 
> 
> I would like to request that nobody starts to commit a vnode based cloning
> (or API changes for it) until they actually have a working prototype.
> I've been there, done that and threw it away.
> 
> The only reason I can see for adding vnode-based cloning would be if
> somebody can point out something they cannot do with namei-based
> cloning...
> 
> <SHAMELESS PLUG>
> My BSDCONey and BSDCON talks would be very good places to ask questions
> about this :-)
> </SHAMELESS PLUG>

My feeling on the whole topic is that we now have a very workable 
system with two drawbacks:

  o The ``clone device'' doesn't turn up in /dev.  This means that an 
    administrator cannot treat it as a filesystem object WRT 
    permissions - in fact, he can't even see it on the filesystem.  
    IMHO this causes namespace problems, but this is also quite 
    fixable.  I'd like to talk about this at BSDConEurope.

  o The SI_CHEAPCLONE stuff is easy to get wrong, and getting it 
    wrong opens up a bad DoS.  Maybe the answer is that specinfos 
    that are returned from make_dev() during clone() have the 
    SI_CHEAPCLONE flag already set and a successful call to the 
    driver's d_open() clears SI_CHEAPCLONE ?

    But this doesn't quite work with the tun device.  The tun device 
    abuses this flag so that it can use dev_depends() to blow away 
    all of it's make_dev()s at module unload time....  It doesn't 
    want to destroy_dev() them at d_close() time because I'd prefer 
    that the administrator is able to ``touch /dev/tunX'' then 
    ``chmod /dev/tunX'' at boot time.

A partially unrelated problem is that of tracking open devices from 
inside a driver.  I'm only mentioning this because the SI_CHEAPCLONE 
flag makes this more difficult - it allows devfs to destroy_dev() 
things when the driver isn't looking....  I don't think my SI_CHEAPCLONE 
abuse in if_tun is correct.  Maybe the right answer is to have devfs 
notify the driver when it destroy_dev()s something ?


> Poul-Henning
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.

-- 
Brian <brian@freebsd-services.com>                <brian@Awfulhak.org>
      http://www.freebsd-services.com/        <brian@[uk.]FreeBSD.org>
Don't _EVER_ lose your sense of humour !      <brian@[uk.]OpenBSD.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message