From owner-freebsd-drivers@freebsd.org Wed Aug 19 14:30:25 2015 Return-Path: Delivered-To: freebsd-drivers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B79C09BEF3F for ; Wed, 19 Aug 2015 14:30:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 925791E58; Wed, 19 Aug 2015 14:30:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (75-48-78-19.lightspeed.cncrca.sbcglobal.net [75.48.78.19]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7AB69B98F; Wed, 19 Aug 2015 10:30:24 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Cc: freebsd-drivers@freebsd.org, Leonardo Fogel , 'Konstantin Belousov' Subject: Re: Race conditions Date: Wed, 19 Aug 2015 07:29:56 -0700 Message-ID: <6889344.0OebVsM7Q3@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20150819084834.GM2072@kib.kiev.ua> References: <1439923294.98963.YahooMailBasic@web120801.mail.ne1.yahoo.com> <10064388.a9lbzVPoX7@ralph.baldwin.cx> <20150819084834.GM2072@kib.kiev.ua> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 19 Aug 2015 10:30:24 -0400 (EDT) X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Aug 2015 14:30:25 -0000 On Wednesday, August 19, 2015 11:48:34 AM Konstantin Belousov wrote: > On Tue, Aug 18, 2015 at 07:30:48PM -0700, John Baldwin wrote: > > On Tuesday, August 18, 2015 11:41:34 AM Leonardo Fogel wrote: > > > Hi. > > > The following code is an exerpt from the FreeBSD Architecture Handbook, chapter 11.1.1. Sample Driver Source. I've included labels i1, i2, i3. > > > > > > int > > > mypci_open(struct cdev *dev, int oflags, int devtype, struct thread *td) > > > { > > > struct mypci_softc *sc; > > > > > > /* Look up our softc. */ > > > i1: sc = dev->si_drv1; > if (sc == NULL) > return (ENXIO); > The new cdev was allocated with M_ZERO flag, so you can rely on the fact > that uninitalized fields are zeroed. > > > > device_printf(sc->my_dev, "Opened successfully.\n"); > > > return (0); > > > } > > > > > > static int > > > mypci_attach(device_t dev) > > > { > > > struct mypci_softc *sc; > > > ... > > > i2: sc->my_cdev = make_dev(&mypci_cdevsw, device_get_unit(dev), > > > UID_ROOT, GID_WHEEL, 0600, "mypci%u", device_get_unit(dev)); > > > i3: sc->my_cdev->si_drv1 = sc; > > > printf("Mypci device loaded.\n"); > > > return (0); > > > } > > > > > > > > > As I understand it, as soon as instruction at label i2 completes, there is a rare possibility that another thread opens the device file and executes the instruction at i1, before the instruction at i3 is executed. Is it correct? How could one fix it? > > > > > It is a race, yes. I know there is a MAKEDEV_REF flag that can be passed to > > make_dev_p() and make_dev_credf() that can hold a reference on the returned > > cdev (so it can't be immediately deleted), but I don't know that that helps > > close the race you reference. > No, MAKEDEV_REF is about calling dev_ref() early enough so that the > dev_clone handlers could safely access cdev that was just created > (otherwise other thread might enter devfs_populate_loop() in parallel > and unref :( ). MAKEDEV_REF has nothing to do with driver-managed > fields initialization. Well, it does allow a clone handler to safely set si_drv1 (which is typically what they do)? However, I didn't think it would help with Leonardo's question. A somewhat related race I ran into while trying to make cloning on /dev/tap more useful is that there isn't any way to "reserve" a cdev during clone such that only the current thread can try to open it (at least as far as I can tell), unless it is true that the cdev's d_open() method is guaranteed to be called once a cdev is returned from the clone handler (which it is not for the stat() case). It's a shame we can't pass down an 'ISOPEN' flag similar to that in namei to the clone handlers. (See https://reviews.freebsd.org/D2797 and related comments) > > I think I would probably prefer we add some sort of wrapper for make_dev > > that accepts the si_drv1 value (and possibly other values) as an argument to > > close this. I'm cc'ing kib@ to see if he has any suggestions or better ideas. > > Yes, this is a known issue in the cdev KPI, but of very low importance. > I agree that a change to cdev KPI is due. One of the existing issues is > that it is already bloated with > make_dev_credf > make_dev_cred > make_dev_p > make_dev > all grown organically to plug this and that uglyness in the KPI. I wanted > to combine all non-naming parameters to make_dev* into some structure, > so that the final function to create cdev is like > int make_dev_uber(struct cdev **res, struct make_dev_args *args, > const char *fmt, ...); > struct make_dev_args { > struct cdevsw *csw; > int flags; > struct ucred *cred; > ... > int si_drv0; > void *si_drv1, *si_drv2; <- eventually > }; > and a helper to do initialization of the structure. > > But as I said above, it is very low priority and I want to gather more > outstanding issues with the KPI before making any decisions there. This sounds like a good approach to me. You could version the args structure if you wanted (I think Glebius has done this for his ifnet work which uses a similar pattern) so you can extend it in the future. -- John Baldwin