Date: Mon, 09 Apr 2012 08:41:15 -0600 From: Ian Lepore <freebsd@damnhippie.dyndns.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: current@freebsd.org, Warner Losh <imp@bsdimp.com>, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization Message-ID: <1333982475.1082.61.camel@revolution.hippie.lan> In-Reply-To: <20120408035838.GY2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <CCBC36A4-CC13-465B-A151-A41DEDF7E55B@bsdimp.com> <20120408035838.GY2358@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > >>> Hello, > > >>> there seems to be a problem with device attach sequence offered by newbus. > > >>> Basically, when device attach method is executing, device is not fully > > >>> initialized yet. Also the device state in the newbus part of the world > > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > > >>> but drivers that e.g. create devfs node to communicate with consumers > > >>> are prone to a race. > > >>> > > >>> If /dev node is created inside device attach method, then usermode > > >>> can start calling cdevsw methods before device fully initialized itself. > > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > > >>> like device_busy(9), then panic occurs "called for unatteched device". > > >>> I get reports from users about this issues, to it is not something > > >>> that only could happen. > > >>> > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > >>> from newbus right after device attach finished and newbus considers > > >>> the device fully initialized. Driver then could create devfs node > > >>> in the after_attach method instead of attach. Please see the patch below. > > >>> > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > >>> index eb720eb..9db74e2 100644 > > >>> --- a/sys/kern/device_if.m > > >>> +++ b/sys/kern/device_if.m > > >>> @@ -43,6 +43,10 @@ INTERFACE device; > > >>> # Default implementations of some methods. > > >>> # > > >>> CODE { > > >>> + static void null_after_attach(device_t dev) > > >>> + { > > >>> + } > > >>> + > > >>> static int null_shutdown(device_t dev) > > >>> { > > >>> return 0; > > >>> @@ -199,6 +203,21 @@ METHOD int attach { > > >>> }; > > >>> > > >>> /** > > >>> + * @brief Notify the driver that device is in attached state > > >>> + * > > >>> + * Called after driver is successfully attached to the device and > > >>> + * corresponding device_t is fully operational. Driver now may expose > > >>> + * the device to the consumers, e.g. create devfs nodes. > > >>> + * > > >>> + * @param dev the device to probe > > >>> + * > > >>> + * @see DEVICE_ATTACH() > > >>> + */ > > >>> +METHOD void after_attach { > > >>> + device_t dev; > > >>> +} DEFAULT null_after_attach; > > >>> + > > >>> +/** > > >>> * @brief Detach a driver from a device. > > >>> * > > >>> * This can be called if the user is replacing the > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > >>> index d485b9f..6d849cb 100644 > > >>> --- a/sys/kern/subr_bus.c > > >>> +++ b/sys/kern/subr_bus.c > > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > >>> dev->state = DS_ATTACHED; > > >>> dev->flags &= ~DF_DONENOMATCH; > > >>> devadded(dev); > > >>> + DEVICE_AFTER_ATTACH(dev); > > >>> return (0); > > >>> } > > >>> > > >> > > >> Does device_get_softc() work before attach is completed? (I don't have > > >> time to go look in the code right now). If so, then a mutex initialized > > >> and acquired early in the driver's attach routine, and also acquired in > > >> the driver's cdev implementation routines before using any newbus > > >> functions other than device_get_softc(), would solve the problem without > > >> a driver api change that would make it harder to backport/MFC driver > > >> changes. > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > on the driver developers, because you cannot sleep under mutex. Changing > > > the mutex to the sleepable lock also does not byy you much, since > > > you need to somehow solve the issues with some cdevsw call waking up > > > thread sleeping into another cdevsw call, just for example. > > > > > > Singlethreading a driver due to this race is just silly. > > > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > > > driver_attach() > > { > > ... > > softc->flags = 0; // redundant, since softc is initialized to 0. > > softc->cdev = device_create...(); > > ... > > softc->flags |= READY; > > } > > > > driver_open(...) > > { > > if (!(softc->flags & READY)) > > return ENXIO; > > ... > > } > > > > What's the big burden here? > The burden is that your proposal does not work. As I described above, > device_busy() calls from cdevsw method panic if open() is called before > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > attach() method returned, so no workarounds from attach() could solve > this. One thing that keeps floating to the front of my brain is that all the proposals so far (including my not-well-thought-out mutex suggestion) requires changing every existing driver to get the new safe behavior. Hmmm. Looking at the code, not very many drivers call device_busy(). Why is that? I agree that calling device_create() should be deferred until the driver is ready to handle requests. That's only part of the fix if the newbus support routines are still going to have a window where they can panic because the internal state variables haven't yet transitioned to the correct state. Also, the implementation of device_busy() looks to be unsafe unless it's being implicitly protected by some locking in a call chain that isn't jumping out at me with simple grepping of the code. For example, concurrent callers in a device's open() and close() methods for a driver that calls busy/unbusy from cdev open/close could leave the parent device's busy count in an indeterminate state. Could fixing this by enforcing single threading through busy/unbusy provide an opportunity to fix the original problem by having the attach code acquire the same lock, so that an early call to a cdev method that invokes device_busy() ends up sleeping until the attach routine returns? That way you don't single-thread the whole cdev open/close handling, just the part that's currently causing a problem. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1333982475.1082.61.camel>