Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Aug 2015 17:52:39 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-drivers@freebsd.org, Leonardo Fogel <leonardofogel@yahoo.com.br>,  "'Konstantin Belousov'" <kib@freebsd.org>
Subject:   Re: Race conditions
Message-ID:  <20150819145239.GS2072@kib.kiev.ua>
In-Reply-To: <6889344.0OebVsM7Q3@ralph.baldwin.cx>
References:  <1439923294.98963.YahooMailBasic@web120801.mail.ne1.yahoo.com> <10064388.a9lbzVPoX7@ralph.baldwin.cx> <20150819084834.GM2072@kib.kiev.ua> <6889344.0OebVsM7Q3@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 19, 2015 at 07:29:56AM -0700, John Baldwin wrote:
> On Wednesday, August 19, 2015 11:48:34 AM Konstantin Belousov wrote:
> > On Tue, Aug 18, 2015 at 07:30:48PM -0700, John Baldwin wrote:
> > > On Tuesday, August 18, 2015 11:41:34 AM Leonardo Fogel wrote:
> > > > Hi.
> > > > The following code is an exerpt from the FreeBSD Architecture Handbook, chapter 11.1.1. Sample Driver Source. I've included labels i1, i2, i3.
> > > > 
> > > >    int
> > > >    mypci_open(struct cdev *dev, int oflags, int devtype, struct thread *td)
> > > >    {
> > > >            struct mypci_softc *sc;
> > > > 
> > > >            /* Look up our softc. */
> > > >    i1:     sc = dev->si_drv1;
> > 		if (sc == NULL)
> > 			return (ENXIO);
> > The new cdev was allocated with M_ZERO flag, so you can rely on the fact
> > that uninitalized fields are zeroed.
> > 
> > > >            device_printf(sc->my_dev, "Opened successfully.\n");
> > > >            return (0);
> > > >    }
> > > > 
> > > >    static int
> > > >    mypci_attach(device_t dev)
> > > >    {
> > > >            struct mypci_softc *sc;
> > > >    ...
> > > >    i2:     sc->my_cdev = make_dev(&mypci_cdevsw, device_get_unit(dev),
> > > >                UID_ROOT, GID_WHEEL, 0600, "mypci%u", device_get_unit(dev));
> > > >    i3:     sc->my_cdev->si_drv1 = sc;
> > > >            printf("Mypci device loaded.\n");
> > > >            return (0);
> > > >    }
> > > > 
> > > > 
> > > > As I understand it, as soon as instruction at label i2 completes, there is a rare possibility that another thread opens the device file and executes the instruction at i1, before the instruction at i3 is executed. Is it correct? How could one fix it?
> > 
> > > 
> > > It is a race, yes.  I know there is a MAKEDEV_REF flag that can be passed to
> > > make_dev_p() and make_dev_credf() that can hold a reference on the returned
> > > cdev (so it can't be immediately deleted), but I don't know that that helps
> > > close the race you reference.
> > No, MAKEDEV_REF is about calling dev_ref() early enough so that the
> > dev_clone handlers could safely access cdev that was just created
> > (otherwise other thread might enter devfs_populate_loop() in parallel
> > and unref :( ).  MAKEDEV_REF has nothing to do with driver-managed
> > fields initialization.
> 
> Well, it does allow a clone handler to safely set si_drv1 (which is typically
> what they do)?  However, I didn't think it would help with Leonardo's
> question.
> 
> A somewhat related race I ran into while trying to make cloning on /dev/tap
> more useful is that there isn't any way to "reserve" a cdev during clone such
> that only the current thread can try to open it (at least as far as I can
> tell), unless it is true that the cdev's d_open() method is guaranteed to be
> called once a cdev is returned from the clone handler (which it is not for
> the stat() case).  It's a shame we can't pass down an 'ISOPEN' flag similar
> to that in namei to the clone handlers.
> 
> (See https://reviews.freebsd.org/D2797 and related comments)
When the clone handler run, there is no even the cdev' associated vnode
known in the thread called the dev_clone event handler.  So regardless
of whether you pass ISOPEN to the dev_clone, there is a possibility that
a parallel open would happens before the current lookup consumer has a
chance to proceed.

> 
> > > I think I would probably prefer we add some sort of wrapper for make_dev
> > > that accepts the si_drv1 value (and possibly other values) as an argument to
> > > close this.  I'm cc'ing kib@ to see if he has any suggestions or better ideas.
> > 
> > Yes, this is a known issue in the cdev KPI, but of very low importance.
> > I agree that a change to cdev KPI is due.  One of the existing issues is
> > that it is already bloated with 
> > 	make_dev_credf
> > 	make_dev_cred
> > 	make_dev_p
> > 	make_dev
> > all grown organically to plug this and that uglyness in the KPI.  I wanted
> > to combine all non-naming parameters to make_dev* into some structure,
> > so that the final function to create cdev is like
> > 	int make_dev_uber(struct cdev **res, struct make_dev_args *args,
> > 	    const char *fmt, ...);
> > 	struct make_dev_args {
> > 		struct cdevsw *csw;
> > 		int flags;
> > 		struct ucred *cred;
> > 		...
> > 		int si_drv0;
> > 		void *si_drv1, *si_drv2; <- eventually
> > 	};
> > and a helper to do initialization of the structure.
> > 
> > But as I said above, it is very low priority and I want to gather more
> > outstanding issues with the KPI before making any decisions there.
> 
> This sounds like a good approach to me.  You could version the args structure
> if you wanted (I think Glebius has done this for his ifnet work which uses a
> similar pattern) so you can extend it in the future.

I thought about the following use:
	struct make_dev_args args;
	struct cdev *cdev;

	make_dev_args_prep(&args);
	args.si_drv1 = softc;
	error = make_dev_uber(&cdev, &args, "exhauster");
So that the _prep() ensures that the _args structure extension is
invisible to the KPI consumers. I have no intent of keeping KBI stable.

Question is, do we need this now ? It is yet another churn in the
make_dev(9) area, where we already have four interfaces, and we must
keep them around for source compatibility. IMO si_drv1 issue does not
justify yet another bloat. If there is any other reason to modify
make_dev(9) group of functions, then I would agree with taking the _args
step.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150819145239.GS2072>