From owner-freebsd-drivers@freebsd.org Fri Aug 28 20:52:55 2015 Return-Path: Delivered-To: freebsd-drivers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DEBB9C48DC for ; Fri, 28 Aug 2015 20:52:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4E99B88A; Fri, 28 Aug 2015 20:52:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 64B06B915; Fri, 28 Aug 2015 16:52:54 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Cc: freebsd-drivers@freebsd.org, Leonardo Fogel , 'Konstantin Belousov' Subject: Re: Race conditions Date: Fri, 28 Aug 2015 13:34:58 -0700 Message-ID: <2785418.Nryjt2Jbzi@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20150819145239.GS2072@kib.kiev.ua> References: <1439923294.98963.YahooMailBasic@web120801.mail.ne1.yahoo.com> <6889344.0OebVsM7Q3@ralph.baldwin.cx> <20150819145239.GS2072@kib.kiev.ua> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 28 Aug 2015 16:52:54 -0400 (EDT) X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 20:52:55 -0000 On Wednesday, August 19, 2015 05:52:39 PM Konstantin Belousov wrote: > On Wed, Aug 19, 2015 at 07:29:56AM -0700, John Baldwin wrote: > > A somewhat related race I ran into while trying to make cloning on /dev/tap > > more useful is that there isn't any way to "reserve" a cdev during clone such > > that only the current thread can try to open it (at least as far as I can > > tell), unless it is true that the cdev's d_open() method is guaranteed to be > > called once a cdev is returned from the clone handler (which it is not for > > the stat() case). It's a shame we can't pass down an 'ISOPEN' flag similar > > to that in namei to the clone handlers. > > > > (See https://reviews.freebsd.org/D2797 and related comments) > When the clone handler run, there is no even the cdev' associated vnode > known in the thread called the dev_clone event handler. So regardless > of whether you pass ISOPEN to the dev_clone, there is a possibility that > a parallel open would happens before the current lookup consumer has a > chance to proceed. Perhaps we could force cloning to serialize with opens? That is, use some sort of global lock in devfs such that any non-cloning opens use a shared lock but an exclusive lock is taken before running clone event handlers (and held until after d_open returns)? To really close this sort of race, the exclusive lock acquired when a clone is created in lookup() would have to be held until devfs_open() is called. That's rather gross. I suppose you could always aquire the lock in devfs_lookup() when ISOPEN is set (exclusive if you have to clone, otherwise shared) and then drop it in devfs_open() after d_open returns. > > > > I think I would probably prefer we add some sort of wrapper for make_dev > > > > that accepts the si_drv1 value (and possibly other values) as an argument to > > > > close this. I'm cc'ing kib@ to see if he has any suggestions or better ideas. > > > > > > Yes, this is a known issue in the cdev KPI, but of very low importance. > > > I agree that a change to cdev KPI is due. One of the existing issues is > > > that it is already bloated with > > > make_dev_credf > > > make_dev_cred > > > make_dev_p > > > make_dev > > > all grown organically to plug this and that uglyness in the KPI. I wanted > > > to combine all non-naming parameters to make_dev* into some structure, > > > so that the final function to create cdev is like > > > int make_dev_uber(struct cdev **res, struct make_dev_args *args, > > > const char *fmt, ...); > > > struct make_dev_args { > > > struct cdevsw *csw; > > > int flags; > > > struct ucred *cred; > > > ... > > > int si_drv0; > > > void *si_drv1, *si_drv2; <- eventually > > > }; > > > and a helper to do initialization of the structure. > > > > > > But as I said above, it is very low priority and I want to gather more > > > outstanding issues with the KPI before making any decisions there. > > > > This sounds like a good approach to me. You could version the args structure > > if you wanted (I think Glebius has done this for his ifnet work which uses a > > similar pattern) so you can extend it in the future. > > I thought about the following use: > struct make_dev_args args; > struct cdev *cdev; > > make_dev_args_prep(&args); > args.si_drv1 = softc; > error = make_dev_uber(&cdev, &args, "exhauster"); > So that the _prep() ensures that the _args structure extension is > invisible to the KPI consumers. I have no intent of keeping KBI stable. Ok. > Question is, do we need this now ? It is yet another churn in the > make_dev(9) area, where we already have four interfaces, and we must > keep them around for source compatibility. IMO si_drv1 issue does not > justify yet another bloat. If there is any other reason to modify > make_dev(9) group of functions, then I would agree with taking the _args > step. Well, we've had this race in most cdev drivers in the tree for a long time. It's a narrow one that doesn't get hit often (if at all) in practice, but if I were to do a sweep to patch all the open routines to handle it, I'd rather we do it this way instead. OTOH, I don't have a burning desire to patch all the open routines. -- John Baldwin