Date: Mon, 9 Apr 2012 17:59:13 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Ian Lepore <freebsd@damnhippie.dyndns.org> Cc: current@freebsd.org, Warner Losh <imp@bsdimp.com>, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization Message-ID: <20120409145913.GR2358@deviant.kiev.zoral.com.ua> In-Reply-To: <1333982475.1082.61.camel@revolution.hippie.lan> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <CCBC36A4-CC13-465B-A151-A41DEDF7E55B@bsdimp.com> <20120408035838.GY2358@deviant.kiev.zoral.com.ua> <1333982475.1082.61.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
--aLNv5lbzIJDYDUXB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote: > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > >=20 > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > >=20 > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > >>> Hello, > > > >>> there seems to be a problem with device attach sequence offered b= y newbus. > > > >>> Basically, when device attach method is executing, device is not = fully > > > >>> initialized yet. Also the device state in the newbus part of the = world > > > >>> is DS_ALIVE. There is definitely no shattering news in the statem= ents, > > > >>> but drivers that e.g. create devfs node to communicate with consu= mers > > > >>> are prone to a race. > > > >>>=20 > > > >>> If /dev node is created inside device attach method, then usermode > > > >>> can start calling cdevsw methods before device fully initialized = itself. > > > >>> Even more, if device tries to use newbus helpers in cdevsw method= s, > > > >>> like device_busy(9), then panic occurs "called for unatteched dev= ice". > > > >>> I get reports from users about this issues, to it is not something > > > >>> that only could happen. > > > >>>=20 > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > >>> from newbus right after device attach finished and newbus conside= rs > > > >>> the device fully initialized. Driver then could create devfs node > > > >>> in the after_attach method instead of attach. Please see the patc= h below. > > > >>>=20 > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > >>> index eb720eb..9db74e2 100644 > > > >>> --- a/sys/kern/device_if.m > > > >>> +++ b/sys/kern/device_if.m > > > >>> @@ -43,6 +43,10 @@ INTERFACE device; > > > >>> # Default implementations of some methods. > > > >>> # > > > >>> CODE { > > > >>> + static void null_after_attach(device_t dev) > > > >>> + { > > > >>> + } > > > >>> + > > > >>> static int null_shutdown(device_t dev) > > > >>> { > > > >>> return 0; > > > >>> @@ -199,6 +203,21 @@ METHOD int attach { > > > >>> }; > > > >>>=20 > > > >>> /** > > > >>> + * @brief Notify the driver that device is in attached state > > > >>> + * > > > >>> + * Called after driver is successfully attached to the device and > > > >>> + * corresponding device_t is fully operational. Driver now may e= xpose > > > >>> + * the device to the consumers, e.g. create devfs nodes. > > > >>> + * > > > >>> + * @param dev the device to probe > > > >>> + * > > > >>> + * @see DEVICE_ATTACH() > > > >>> + */ > > > >>> +METHOD void after_attach { > > > >>> + device_t dev; > > > >>> +} DEFAULT null_after_attach; > > > >>> + > > > >>> +/** > > > >>> * @brief Detach a driver from a device. > > > >>> * > > > >>> * This can be called if the user is replacing the > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > >>> index d485b9f..6d849cb 100644 > > > >>> --- a/sys/kern/subr_bus.c > > > >>> +++ b/sys/kern/subr_bus.c > > > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > >>> dev->state =3D DS_ATTACHED; > > > >>> dev->flags &=3D ~DF_DONENOMATCH; > > > >>> devadded(dev); > > > >>> + DEVICE_AFTER_ATTACH(dev); > > > >>> return (0); > > > >>> } > > > >>>=20 > > > >>=20 > > > >> Does device_get_softc() work before attach is completed? (I don't= have > > > >> time to go look in the code right now). If so, then a mutex initi= alized > > > >> and acquired early in the driver's attach routine, and also acquir= ed in > > > >> the driver's cdev implementation routines before using any newbus > > > >> functions other than device_get_softc(), would solve the problem w= ithout > > > >> a driver api change that would make it harder to backport/MFC driv= er > > > >> changes. > > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > > on the driver developers, because you cannot sleep under mutex. Cha= nging > > > > the mutex to the sleepable lock also does not byy you much, since > > > > you need to somehow solve the issues with some cdevsw call waking up > > > > thread sleeping into another cdevsw call, just for example. > > > >=20 > > > > Singlethreading a driver due to this race is just silly. > > > >=20 > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > >=20 > > > driver_attach() > > > { > > > ... > > > softc->flags =3D 0; // redundant, since softc is initialized to 0. > > > softc->cdev =3D device_create...(); > > > ... > > > softc->flags |=3D READY; > > > } > > >=20 > > > driver_open(...) > > > { > > > if (!(softc->flags & READY)) > > > return ENXIO; > > > ... > > > } > > >=20 > > > What's the big burden here? > > The burden is that your proposal does not work. As I described above, > > device_busy() calls from cdevsw method panic if open() is called before > > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > > attach() method returned, so no workarounds from attach() could solve > > this. >=20 > One thing that keeps floating to the front of my brain is that all the > proposals so far (including my not-well-thought-out mutex suggestion) > requires changing every existing driver to get the new safe behavior. >=20 > Hmmm. Looking at the code, not very many drivers call device_busy(). > Why is that? =20 >=20 > I agree that calling device_create() should be deferred until the driver You mean make_dev(9), I assume. > is ready to handle requests. That's only part of the fix if the newbus > support routines are still going to have a window where they can panic > because the internal state variables haven't yet transitioned to the > correct state. =20 >=20 > Also, the implementation of device_busy() looks to be unsafe unless it's > being implicitly protected by some locking in a call chain that isn't > jumping out at me with simple grepping of the code. For example, > concurrent callers in a device's open() and close() methods for a driver > that calls busy/unbusy from cdev open/close could leave the parent > device's busy count in an indeterminate state. Could fixing this by > enforcing single threading through busy/unbusy provide an opportunity to > fix the original problem by having the attach code acquire the same > lock, so that an early call to a cdev method that invokes device_busy() > ends up sleeping until the attach routine returns? That way you don't > single-thread the whole cdev open/close handling, just the part that's > currently causing a problem. I think device_busy() calls require Giant, the same as the whole newbus. The absence of GIANT_REQUIRED assertions in device_busy() and device_unbusy= () looks like overlook. --aLNv5lbzIJDYDUXB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+C+UEACgkQC3+MBN1Mb4jyyQCfVT7lHgGeiLMsjUFMCQIgzM9e 6lMAniYIyja/0aayMj78aYDJgAfpU6WC =HEjq -----END PGP SIGNATURE----- --aLNv5lbzIJDYDUXB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120409145913.GR2358>