From owner-freebsd-arch Mon Oct 8 1:33:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (gw.Awfulhak.org [217.204.245.18]) by hub.freebsd.org (Postfix) with ESMTP id 4DBFE37B403 for ; Mon, 8 Oct 2001 01:33:43 -0700 (PDT) Received: from hak.lan.Awfulhak.org (root@hak.lan.Awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.6/8.11.6) with ESMTP id f988XbJ20011; Mon, 8 Oct 2001 09:33:37 +0100 (BST) (envelope-from brian@freebsd-services.com) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.6/8.11.6) with ESMTP id f988UxT40831; Mon, 8 Oct 2001 09:30:59 +0100 (BST) (envelope-from brian@freebsd-services.com) Message-Id: <200110080830.f988UxT40831@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Poul-Henning Kamp Cc: Peter Wemm , Brian Somers , freebsd-arch@FreeBSD.ORG, brian@freebsd-services.com Subject: Re: Cloned open support In-Reply-To: Message from Poul-Henning Kamp of "Mon, 08 Oct 2001 07:43:11 +0200." <95926.1002519791@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 08 Oct 2001 09:30:59 +0100 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi, First, let me mention that I think your response is a bit odd.... timing wise. I posted the message that you've responded to on January 29 and you responded on October 7. I implemented cloning on if_tun in May... :*P Maybe my mail server is under performing ! > In message <20011007221706.1ABAE3809@overcee.netplex.com.au>, Peter Wemm writes > : > >Poul-Henning Kamp wrote: > >> > >> Uhm, Brian... > >> > >> We have cloned devices already... > >> > >> What exactly is it that you want to implement ? > >> > >> Poul-Henning > > > >Devfs name cloning feels hackish to me. Having a seperate EVENTHANDLER() > >for doing it feels .. just nasty. I'd much rather that we had a d_clone > >devsw entry and/or a D_CLONE d_flags entry. > > Call it hackish, I call it elegant: > > * I didn't have to modify all the device drivers making them > incompatible with anything anybody ever learned. > > * I didn't have to do long-haired vnode operations in cloning > drivers, thus preserving the ability to do systematic SMP > lock-boundaries at the cdevsw-> level. > > * It supports parameterized clone opens (ie: not just "/dev/pty", > but also "/dev/ad0s1g" and even if somebody implemented it: > "/dev/ccd,mirror,ad0s1f,ad1s1f" :-) > > * It is a *LOT* simpler than doing it by vnodes... > > The story is that by the time you reach devsw->open() you have > committed the vnode and if you change the device at that time you > need to unwind all the way back up to the association of the vnode > with the dev_t, and wind all the way back down before the open can > progress. > > (The fact that it is an EVENTHANDLER is just a matter of implementation, > I didn't see a point to reimplementing the same functionality when > EVENTHANDLERS already were available). > > >There are two types of cloning. One is to map some name "/dev/fd0135ds18h" > >into a device node without having to flood /dev with all possible > >permutations. The other is to support per-device "select next unit" style > >opens. Presently these are both kludged into the EVENTHANDLER interface. > > Those two are actually the same kind of open Peter, semantically > they both say "make me a device according to this wish: ``...'' and let > me open it". > > >I think Brian wants to move the second part directly into the open handler > >like it is done on most other OS's that support cloning. Personally, > >I would be quite happy if we could do that. > > Most other devices have made a mess of their vnodes and drivers by > doing so :-( > > The FreeBSD implementation completely sidesteps all the vnode hair > by doing the cloning at namei() time instead of open time, this > makes it much simpler and much more capable. > > If you do a vnode based cloning, it will not support your > "/dev/fd0135ds18h" example above, unless you flood /dev with all > possible entries. > > >I realize why it is done the way it is done now though. VOP_LOOKUP() > >having to return a unique vnode for the device is a pain. (which is why > >the clone is done during lookup, so that the correct vnode is found and > >available). But understanding why doesn't mean that I dont wish that it > >could be different. :-) > > Well, if devices lived at the filedescriptor level instead of at the vnode > level, things would be different (but I havn't tried to implement that > so I can't say for sure if it would actually be "better"...) > > >Doug mentions the hack in dev/streams/streams.c: > > td->td_dupfd = fd; > > return ENXIO; > >.. this is nasty. :-) > > This is abuse, it should be rewritten. > > >I think the SVR4 clone driver uses something like this. It causes the > >original namei / open attempt to fail (thus releasing the "common" vnode) > >and then switching over to the *real* file/vnode at the last minute. > > We would have to do that as well in order to unwind the committed vnode > and select another. > > > I would like to request that nobody starts to commit a vnode based cloning > (or API changes for it) until they actually have a working prototype. > I've been there, done that and threw it away. > > The only reason I can see for adding vnode-based cloning would be if > somebody can point out something they cannot do with namei-based > cloning... > > > My BSDCONey and BSDCON talks would be very good places to ask questions > about this :-) > My feeling on the whole topic is that we now have a very workable system with two drawbacks: o The ``clone device'' doesn't turn up in /dev. This means that an administrator cannot treat it as a filesystem object WRT permissions - in fact, he can't even see it on the filesystem. IMHO this causes namespace problems, but this is also quite fixable. I'd like to talk about this at BSDConEurope. o The SI_CHEAPCLONE stuff is easy to get wrong, and getting it wrong opens up a bad DoS. Maybe the answer is that specinfos that are returned from make_dev() during clone() have the SI_CHEAPCLONE flag already set and a successful call to the driver's d_open() clears SI_CHEAPCLONE ? But this doesn't quite work with the tun device. The tun device abuses this flag so that it can use dev_depends() to blow away all of it's make_dev()s at module unload time.... It doesn't want to destroy_dev() them at d_close() time because I'd prefer that the administrator is able to ``touch /dev/tunX'' then ``chmod /dev/tunX'' at boot time. A partially unrelated problem is that of tracking open devices from inside a driver. I'm only mentioning this because the SI_CHEAPCLONE flag makes this more difficult - it allows devfs to destroy_dev() things when the driver isn't looking.... I don't think my SI_CHEAPCLONE abuse in if_tun is correct. Maybe the right answer is to have devfs notify the driver when it destroy_dev()s something ? > Poul-Henning > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. -- Brian http://www.freebsd-services.com/ Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message