From owner-freebsd-drivers@FreeBSD.ORG Sun Apr 8 03:08:55 2012 Return-Path: Delivered-To: drivers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C9E3C106564A; Sun, 8 Apr 2012 03:08:55 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 720C58FC0A; Sun, 8 Apr 2012 03:08:55 +0000 (UTC) Received: from 63.imp.bsdimp.com (63.imp.bsdimp.com [10.0.0.63]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q3838eQS076876 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Sat, 7 Apr 2012 21:08:41 -0600 (MDT) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <1333810001.1082.36.camel@revolution.hippie.lan> Date: Sat, 7 Apr 2012 21:08:41 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> To: Ian Lepore X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Sat, 07 Apr 2012 21:08:42 -0600 (MDT) Cc: current@FreeBSD.org, drivers@FreeBSD.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 03:08:55 -0000 On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: >> Hello, >> there seems to be a problem with device attach sequence offered by = newbus. >> Basically, when device attach method is executing, device is not = fully >> initialized yet. Also the device state in the newbus part of the = world >> is DS_ALIVE. There is definitely no shattering news in the = statements, >> but drivers that e.g. create devfs node to communicate with consumers >> are prone to a race. >>=20 >> If /dev node is created inside device attach method, then usermode >> can start calling cdevsw methods before device fully initialized = itself. >> Even more, if device tries to use newbus helpers in cdevsw methods, >> like device_busy(9), then panic occurs "called for unatteched = device". >> I get reports from users about this issues, to it is not something >> that only could happen. >>=20 >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called >> from newbus right after device attach finished and newbus considers >> the device fully initialized. Driver then could create devfs node >> in the after_attach method instead of attach. Please see the patch = below. >>=20 >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m >> index eb720eb..9db74e2 100644 >> --- a/sys/kern/device_if.m >> +++ b/sys/kern/device_if.m >> @@ -43,6 +43,10 @@ INTERFACE device; >> # Default implementations of some methods. >> # >> CODE { >> + static void null_after_attach(device_t dev) >> + { >> + } >> + >> static int null_shutdown(device_t dev) >> { >> return 0; >> @@ -199,6 +203,21 @@ METHOD int attach { >> }; >>=20 >> /** >> + * @brief Notify the driver that device is in attached state >> + * >> + * Called after driver is successfully attached to the device and >> + * corresponding device_t is fully operational. Driver now may = expose >> + * the device to the consumers, e.g. create devfs nodes. >> + * >> + * @param dev the device to probe >> + * >> + * @see DEVICE_ATTACH() >> + */ >> +METHOD void after_attach { >> + device_t dev; >> +} DEFAULT null_after_attach; >> + >> +/** >> * @brief Detach a driver from a device. >> * >> * This can be called if the user is replacing the >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c >> index d485b9f..6d849cb 100644 >> --- a/sys/kern/subr_bus.c >> +++ b/sys/kern/subr_bus.c >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) >> dev->state =3D DS_ATTACHED; >> dev->flags &=3D ~DF_DONENOMATCH; >> devadded(dev); >> + DEVICE_AFTER_ATTACH(dev); >> return (0); >> } >>=20 >=20 > Does device_get_softc() work before attach is completed? (I don't = have > time to go look in the code right now). If so, then a mutex = initialized > and acquired early in the driver's attach routine, and also acquired = in > the driver's cdev implementation routines before using any newbus > functions other than device_get_softc(), would solve the problem = without > a driver api change that would make it harder to backport/MFC driver > changes. Also, more generally, don't create the dev nodes before you are ready to = deal with requests. Why do we need to uglify everything here? If you = can't do that, you can check a bit in the softc and return EBUSY or = ENXIO on open if that bit says that your driver isn't ready to accept = requests. Warner= From owner-freebsd-drivers@FreeBSD.ORG Sun Apr 8 03:14:43 2012 Return-Path: Delivered-To: drivers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5AE18106564A; Sun, 8 Apr 2012 03:14:43 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id EB7AD8FC08; Sun, 8 Apr 2012 03:14:42 +0000 (UTC) Received: from 63.imp.bsdimp.com (63.imp.bsdimp.com [10.0.0.63]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q383D76o076900 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Sat, 7 Apr 2012 21:13:07 -0600 (MDT) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <4F80579B.4040205@freebsd.org> Date: Sat, 7 Apr 2012 21:13:06 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <4F80579B.4040205@freebsd.org> To: Nathan Whitehorn X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Sat, 07 Apr 2012 21:13:07 -0600 (MDT) Cc: current@FreeBSD.org, drivers@FreeBSD.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 03:14:43 -0000 On Apr 7, 2012, at 9:04 AM, Nathan Whitehorn wrote: > On 04/07/12 07:50, Konstantin Belousov wrote: >> Hello, >> there seems to be a problem with device attach sequence offered by = newbus. >> Basically, when device attach method is executing, device is not = fully >> initialized yet. Also the device state in the newbus part of the = world >> is DS_ALIVE. There is definitely no shattering news in the = statements, >> but drivers that e.g. create devfs node to communicate with consumers >> are prone to a race. >>=20 >> If /dev node is created inside device attach method, then usermode >> can start calling cdevsw methods before device fully initialized = itself. >> Even more, if device tries to use newbus helpers in cdevsw methods, >> like device_busy(9), then panic occurs "called for unatteched = device". >> I get reports from users about this issues, to it is not something >> that only could happen. >>=20 >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called >> from newbus right after device attach finished and newbus considers >> the device fully initialized. Driver then could create devfs node >> in the after_attach method instead of attach. Please see the patch = below. >>=20 >=20 > Something like this would also be very useful for drivers that need to = interact across the device tree, if newbus called it only after all = drivers have been attached. Drivers that need to see other potentially = attached drivers now need to do some hacks with SYSINIT. Would it be = possible to do this? I don't think it changes the functionality you = need. Well, there's a problem here. You never know that you've attached = everything, so you could never really call it. Many busses enumerate = asynchronously, so it may be hard to get the semantics that you desire = unless the two devices you want to coordinate are in the static part of = the tree. Warner From owner-freebsd-drivers@FreeBSD.ORG Sun Apr 8 03:14:48 2012 Return-Path: Delivered-To: drivers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4877A1065670; Sun, 8 Apr 2012 03:14:48 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id E8C138FC0A; Sun, 8 Apr 2012 03:14:47 +0000 (UTC) Received: from 63.imp.bsdimp.com (63.imp.bsdimp.com [10.0.0.63]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q383At8n076884 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Sat, 7 Apr 2012 21:10:56 -0600 (MDT) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <20120407145728.GV2358@deviant.kiev.zoral.com.ua> Date: Sat, 7 Apr 2012 21:10:55 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Sat, 07 Apr 2012 21:10:56 -0600 (MDT) Cc: Ian Lepore , current@FreeBSD.org, drivers@FreeBSD.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 03:14:48 -0000 On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: >>> Hello, >>> there seems to be a problem with device attach sequence offered by = newbus. >>> Basically, when device attach method is executing, device is not = fully >>> initialized yet. Also the device state in the newbus part of the = world >>> is DS_ALIVE. There is definitely no shattering news in the = statements, >>> but drivers that e.g. create devfs node to communicate with = consumers >>> are prone to a race. >>>=20 >>> If /dev node is created inside device attach method, then usermode >>> can start calling cdevsw methods before device fully initialized = itself. >>> Even more, if device tries to use newbus helpers in cdevsw methods, >>> like device_busy(9), then panic occurs "called for unatteched = device". >>> I get reports from users about this issues, to it is not something >>> that only could happen. >>>=20 >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called >>> from newbus right after device attach finished and newbus considers >>> the device fully initialized. Driver then could create devfs node >>> in the after_attach method instead of attach. Please see the patch = below. >>>=20 >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m >>> index eb720eb..9db74e2 100644 >>> --- a/sys/kern/device_if.m >>> +++ b/sys/kern/device_if.m >>> @@ -43,6 +43,10 @@ INTERFACE device; >>> # Default implementations of some methods. >>> # >>> CODE { >>> + static void null_after_attach(device_t dev) >>> + { >>> + } >>> + >>> static int null_shutdown(device_t dev) >>> { >>> return 0; >>> @@ -199,6 +203,21 @@ METHOD int attach { >>> }; >>>=20 >>> /** >>> + * @brief Notify the driver that device is in attached state >>> + * >>> + * Called after driver is successfully attached to the device and >>> + * corresponding device_t is fully operational. Driver now may = expose >>> + * the device to the consumers, e.g. create devfs nodes. >>> + * >>> + * @param dev the device to probe >>> + * >>> + * @see DEVICE_ATTACH() >>> + */ >>> +METHOD void after_attach { >>> + device_t dev; >>> +} DEFAULT null_after_attach; >>> + >>> +/** >>> * @brief Detach a driver from a device. >>> * >>> * This can be called if the user is replacing the >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c >>> index d485b9f..6d849cb 100644 >>> --- a/sys/kern/subr_bus.c >>> +++ b/sys/kern/subr_bus.c >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) >>> dev->state =3D DS_ATTACHED; >>> dev->flags &=3D ~DF_DONENOMATCH; >>> devadded(dev); >>> + DEVICE_AFTER_ATTACH(dev); >>> return (0); >>> } >>>=20 >>=20 >> Does device_get_softc() work before attach is completed? (I don't = have >> time to go look in the code right now). If so, then a mutex = initialized >> and acquired early in the driver's attach routine, and also acquired = in >> the driver's cdev implementation routines before using any newbus >> functions other than device_get_softc(), would solve the problem = without >> a driver api change that would make it harder to backport/MFC driver >> changes. > No, 'a mutex' does not solve anything. It only adds enourmous burden > on the driver developers, because you cannot sleep under mutex. = Changing > the mutex to the sleepable lock also does not byy you much, since > you need to somehow solve the issues with some cdevsw call waking up > thread sleeping into another cdevsw call, just for example. >=20 > Singlethreading a driver due to this race is just silly. >=20 > And, what do you mean by 'making it harder to MFC' ? How ? driver_attach() { ... softc->flags =3D 0; // redundant, since softc is initialized to = 0. softc->cdev =3D device_create...(); ... softc->flags |=3D READY; } driver_open(...) { if (!(softc->flags & READY)) return ENXIO; ... } What's the big burden here? Warner= From owner-freebsd-drivers@FreeBSD.ORG Sun Apr 8 03:22:33 2012 Return-Path: Delivered-To: drivers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50892106564A; Sun, 8 Apr 2012 03:22:33 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id F21F78FC0A; Sun, 8 Apr 2012 03:22:32 +0000 (UTC) Received: from 63.imp.bsdimp.com (63.imp.bsdimp.com [10.0.0.63]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q383F4PK076923 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Sat, 7 Apr 2012 21:15:05 -0600 (MDT) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <4F80B2FE.1030007@freebsd.org> Date: Sat, 7 Apr 2012 21:15:04 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <13B344FC-AD72-4CF5-AA7A-4C1BDE9F0A7D@bsdimp.com> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <4F80579B.4040205@freebsd.org> <20120407151514.GW2358@deviant.kiev.zoral.com.ua> <4F80B2FE.1030007@freebsd.org> To: Julian Elischer X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Sat, 07 Apr 2012 21:15:05 -0600 (MDT) Cc: current@FreeBSD.org, Nathan Whitehorn , drivers@FreeBSD.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 03:22:33 -0000 On Apr 7, 2012, at 3:34 PM, Julian Elischer wrote: > On 4/7/12 8:15 AM, Konstantin Belousov wrote: >> On Sat, Apr 07, 2012 at 10:04:59AM -0500, Nathan Whitehorn wrote: >>> On 04/07/12 07:50, Konstantin Belousov wrote: >>>> Hello, >>>> there seems to be a problem with device attach sequence offered by = newbus. >>>> Basically, when device attach method is executing, device is not = fully >>>> initialized yet. Also the device state in the newbus part of the = world >>>> is DS_ALIVE. There is definitely no shattering news in the = statements, >>>> but drivers that e.g. create devfs node to communicate with = consumers >>>> are prone to a race. >>>>=20 >>>> If /dev node is created inside device attach method, then usermode >>>> can start calling cdevsw methods before device fully initialized = itself. >>>> Even more, if device tries to use newbus helpers in cdevsw methods, >>>> like device_busy(9), then panic occurs "called for unatteched = device". >>>> I get reports from users about this issues, to it is not something >>>> that only could happen. >>>>=20 >>>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called >>> >from newbus right after device attach finished and newbus considers >>>> the device fully initialized. Driver then could create devfs node >>>> in the after_attach method instead of attach. Please see the patch = below. >>>>=20 >>> Something like this would also be very useful for drivers that need = to >>> interact across the device tree, if newbus called it only after all >>> drivers have been attached. Drivers that need to see other = potentially >>> attached drivers now need to do some hacks with SYSINIT. Would it be >>> possible to do this? I don't think it changes the functionality you = need. >> I am definitely fine with postponing a call further, but I am not = sure >> how to define that point and how to implement your proposal. I am = mostly >> thinking of the case of kld being loaded, since for compiled-in = drivers, >> there is simply no usermode to make the havoc during attach. >=20 > no, Geom starts tasting disk devices as as soon as you make the = device. Which is why the typical way to cope with this is to not create the = device until you are ready to service requests for it. The other method = is to have a flag that's only set after you actually are ready that the = open routine can check and keep people out of the rest of the devsw = functions by returning ENXIO or similar error. Warner >> For the boot time attachments, I am not sure when to declare the end. >> E.g. USB does asynchronous device discovery. >>=20 >> Could you prototype a change that would do what you propose ? >=20 > _______________________________________________ > freebsd-drivers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-drivers > To unsubscribe, send any mail to = "freebsd-drivers-unsubscribe@freebsd.org" >=20 >=20 From owner-freebsd-drivers@FreeBSD.ORG Sun Apr 8 03:58:55 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1821106564A; Sun, 8 Apr 2012 03:58:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 66EA48FC08; Sun, 8 Apr 2012 03:58:54 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q383wd2B099237; Sun, 8 Apr 2012 06:58:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q383wcZE092081; Sun, 8 Apr 2012 06:58:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q383wch2092080; Sun, 8 Apr 2012 06:58:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 8 Apr 2012 06:58:38 +0300 From: Konstantin Belousov To: Warner Losh Message-ID: <20120408035838.GY2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6u7EtAStsNhw5cUf" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 03:58:56 -0000 --6u7EtAStsNhw5cUf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: >=20 > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: >=20 > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > >>> Hello, > >>> there seems to be a problem with device attach sequence offered by ne= wbus. > >>> Basically, when device attach method is executing, device is not fully > >>> initialized yet. Also the device state in the newbus part of the world > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > >>> but drivers that e.g. create devfs node to communicate with consumers > >>> are prone to a race. > >>>=20 > >>> If /dev node is created inside device attach method, then usermode > >>> can start calling cdevsw methods before device fully initialized itse= lf. > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > >>> like device_busy(9), then panic occurs "called for unatteched device". > >>> I get reports from users about this issues, to it is not something > >>> that only could happen. > >>>=20 > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > >>> from newbus right after device attach finished and newbus considers > >>> the device fully initialized. Driver then could create devfs node > >>> in the after_attach method instead of attach. Please see the patch be= low. > >>>=20 > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > >>> index eb720eb..9db74e2 100644 > >>> --- a/sys/kern/device_if.m > >>> +++ b/sys/kern/device_if.m > >>> @@ -43,6 +43,10 @@ INTERFACE device; > >>> # Default implementations of some methods. > >>> # > >>> CODE { > >>> + static void null_after_attach(device_t dev) > >>> + { > >>> + } > >>> + > >>> static int null_shutdown(device_t dev) > >>> { > >>> return 0; > >>> @@ -199,6 +203,21 @@ METHOD int attach { > >>> }; > >>>=20 > >>> /** > >>> + * @brief Notify the driver that device is in attached state > >>> + * > >>> + * Called after driver is successfully attached to the device and > >>> + * corresponding device_t is fully operational. Driver now may expose > >>> + * the device to the consumers, e.g. create devfs nodes. > >>> + * > >>> + * @param dev the device to probe > >>> + * > >>> + * @see DEVICE_ATTACH() > >>> + */ > >>> +METHOD void after_attach { > >>> + device_t dev; > >>> +} DEFAULT null_after_attach; > >>> + > >>> +/** > >>> * @brief Detach a driver from a device. > >>> * > >>> * This can be called if the user is replacing the > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > >>> index d485b9f..6d849cb 100644 > >>> --- a/sys/kern/subr_bus.c > >>> +++ b/sys/kern/subr_bus.c > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > >>> dev->state =3D DS_ATTACHED; > >>> dev->flags &=3D ~DF_DONENOMATCH; > >>> devadded(dev); > >>> + DEVICE_AFTER_ATTACH(dev); > >>> return (0); > >>> } > >>>=20 > >>=20 > >> Does device_get_softc() work before attach is completed? (I don't have > >> time to go look in the code right now). If so, then a mutex initializ= ed > >> and acquired early in the driver's attach routine, and also acquired in > >> the driver's cdev implementation routines before using any newbus > >> functions other than device_get_softc(), would solve the problem witho= ut > >> a driver api change that would make it harder to backport/MFC driver > >> changes. > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > on the driver developers, because you cannot sleep under mutex. Changing > > the mutex to the sleepable lock also does not byy you much, since > > you need to somehow solve the issues with some cdevsw call waking up > > thread sleeping into another cdevsw call, just for example. > >=20 > > Singlethreading a driver due to this race is just silly. > >=20 > > And, what do you mean by 'making it harder to MFC' ? How ? >=20 > driver_attach() > { > ... > softc->flags =3D 0; // redundant, since softc is initialized to 0. > softc->cdev =3D device_create...(); > ... > softc->flags |=3D READY; > } >=20 > driver_open(...) > { > if (!(softc->flags & READY)) > return ENXIO; > ... > } >=20 > What's the big burden here? The burden is that your proposal does not work. As I described above, device_busy() calls from cdevsw method panic if open() is called before DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device attach() method returned, so no workarounds from attach() could solve this. --6u7EtAStsNhw5cUf Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+BDO4ACgkQC3+MBN1Mb4hckQCgsGf27T3LtF0hlsguyfG8JpPu 1rUAnjvjSYfLZ6uFkgGSKYrXPSokZ3w8 =AgpQ -----END PGP SIGNATURE----- --6u7EtAStsNhw5cUf-- From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 14:09:04 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E3461065670 for ; Mon, 9 Apr 2012 14:09:04 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 1E1618FC0C for ; Mon, 9 Apr 2012 14:09:04 +0000 (UTC) Received: from omta07.emeryville.ca.mail.comcast.net ([76.96.30.59]) by qmta14.emeryville.ca.mail.comcast.net with comcast id vq381i00C1GXsucAEq7ynY; Mon, 09 Apr 2012 14:07:58 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta07.emeryville.ca.mail.comcast.net with comcast id vq7w1i00f4NgCEG8Uq7xFb; Mon, 09 Apr 2012 14:07:58 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q39E7ta8057088; Mon, 9 Apr 2012 08:07:55 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20120407145728.GV2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="us-ascii" Date: Mon, 09 Apr 2012 08:07:55 -0600 Message-ID: <1333980475.1082.47.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 14:09:04 -0000 On Sat, 2012-04-07 at 17:57 +0300, Konstantin Belousov wrote: > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > Hello, > > > there seems to be a problem with device attach sequence offered by newbus. > > > Basically, when device attach method is executing, device is not fully > > > initialized yet. Also the device state in the newbus part of the world > > > is DS_ALIVE. There is definitely no shattering news in the statements, > > > but drivers that e.g. create devfs node to communicate with consumers > > > are prone to a race. > > > > > > If /dev node is created inside device attach method, then usermode > > > can start calling cdevsw methods before device fully initialized itself. > > > Even more, if device tries to use newbus helpers in cdevsw methods, > > > like device_busy(9), then panic occurs "called for unatteched device". > > > I get reports from users about this issues, to it is not something > > > that only could happen. > > > > > > I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > from newbus right after device attach finished and newbus considers > > > the device fully initialized. Driver then could create devfs node > > > in the after_attach method instead of attach. Please see the patch below. > > > > > > diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > index eb720eb..9db74e2 100644 > > > --- a/sys/kern/device_if.m > > > +++ b/sys/kern/device_if.m > > > @@ -43,6 +43,10 @@ INTERFACE device; > > > # Default implementations of some methods. > > > # > > > CODE { > > > + static void null_after_attach(device_t dev) > > > + { > > > + } > > > + > > > static int null_shutdown(device_t dev) > > > { > > > return 0; > > > @@ -199,6 +203,21 @@ METHOD int attach { > > > }; > > > > > > /** > > > + * @brief Notify the driver that device is in attached state > > > + * > > > + * Called after driver is successfully attached to the device and > > > + * corresponding device_t is fully operational. Driver now may expose > > > + * the device to the consumers, e.g. create devfs nodes. > > > + * > > > + * @param dev the device to probe > > > + * > > > + * @see DEVICE_ATTACH() > > > + */ > > > +METHOD void after_attach { > > > + device_t dev; > > > +} DEFAULT null_after_attach; > > > + > > > +/** > > > * @brief Detach a driver from a device. > > > * > > > * This can be called if the user is replacing the > > > diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > index d485b9f..6d849cb 100644 > > > --- a/sys/kern/subr_bus.c > > > +++ b/sys/kern/subr_bus.c > > > @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > dev->state = DS_ATTACHED; > > > dev->flags &= ~DF_DONENOMATCH; > > > devadded(dev); > > > + DEVICE_AFTER_ATTACH(dev); > > > return (0); > > > } > > > > > > > Does device_get_softc() work before attach is completed? (I don't have > > time to go look in the code right now). If so, then a mutex initialized > > and acquired early in the driver's attach routine, and also acquired in > > the driver's cdev implementation routines before using any newbus > > functions other than device_get_softc(), would solve the problem without > > a driver api change that would make it harder to backport/MFC driver > > changes. > No, 'a mutex' does not solve anything. It only adds enourmous burden > on the driver developers, because you cannot sleep under mutex. Changing > the mutex to the sleepable lock also does not byy you much, since > you need to somehow solve the issues with some cdevsw call waking up > thread sleeping into another cdevsw call, just for example. > > Singlethreading a driver due to this race is just silly. > > And, what do you mean by 'making it harder to MFC' ? How ? I frequently find myself having to backport driver changes further back than any currently-supported FreeBSD release, and something like a new function in newbus can make that pretty hard to do. That often makes me think of how accomplish something with a minimally-invasive change. (So it's really more of a selfish personal goal more than a FreeBSD-project goal.) -- Ian From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 14:42:25 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 46FC51065670 for ; Mon, 9 Apr 2012 14:42:25 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 243288FC1B for ; Mon, 9 Apr 2012 14:42:25 +0000 (UTC) Received: from omta23.emeryville.ca.mail.comcast.net ([76.96.30.90]) by qmta02.emeryville.ca.mail.comcast.net with comcast id vqgS1i0011wfjNsA2qhKsl; Mon, 09 Apr 2012 14:41:19 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta23.emeryville.ca.mail.comcast.net with comcast id vqhH1i01H4NgCEG8jqhJWa; Mon, 09 Apr 2012 14:41:19 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q39EfF4X057102; Mon, 9 Apr 2012 08:41:15 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20120408035838.GY2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <20120408035838.GY2358@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="us-ascii" Date: Mon, 09 Apr 2012 08:41:15 -0600 Message-ID: <1333982475.1082.61.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 14:42:25 -0000 On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > >>> Hello, > > >>> there seems to be a problem with device attach sequence offered by newbus. > > >>> Basically, when device attach method is executing, device is not fully > > >>> initialized yet. Also the device state in the newbus part of the world > > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > > >>> but drivers that e.g. create devfs node to communicate with consumers > > >>> are prone to a race. > > >>> > > >>> If /dev node is created inside device attach method, then usermode > > >>> can start calling cdevsw methods before device fully initialized itself. > > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > > >>> like device_busy(9), then panic occurs "called for unatteched device". > > >>> I get reports from users about this issues, to it is not something > > >>> that only could happen. > > >>> > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > >>> from newbus right after device attach finished and newbus considers > > >>> the device fully initialized. Driver then could create devfs node > > >>> in the after_attach method instead of attach. Please see the patch below. > > >>> > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > >>> index eb720eb..9db74e2 100644 > > >>> --- a/sys/kern/device_if.m > > >>> +++ b/sys/kern/device_if.m > > >>> @@ -43,6 +43,10 @@ INTERFACE device; > > >>> # Default implementations of some methods. > > >>> # > > >>> CODE { > > >>> + static void null_after_attach(device_t dev) > > >>> + { > > >>> + } > > >>> + > > >>> static int null_shutdown(device_t dev) > > >>> { > > >>> return 0; > > >>> @@ -199,6 +203,21 @@ METHOD int attach { > > >>> }; > > >>> > > >>> /** > > >>> + * @brief Notify the driver that device is in attached state > > >>> + * > > >>> + * Called after driver is successfully attached to the device and > > >>> + * corresponding device_t is fully operational. Driver now may expose > > >>> + * the device to the consumers, e.g. create devfs nodes. > > >>> + * > > >>> + * @param dev the device to probe > > >>> + * > > >>> + * @see DEVICE_ATTACH() > > >>> + */ > > >>> +METHOD void after_attach { > > >>> + device_t dev; > > >>> +} DEFAULT null_after_attach; > > >>> + > > >>> +/** > > >>> * @brief Detach a driver from a device. > > >>> * > > >>> * This can be called if the user is replacing the > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > >>> index d485b9f..6d849cb 100644 > > >>> --- a/sys/kern/subr_bus.c > > >>> +++ b/sys/kern/subr_bus.c > > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > >>> dev->state = DS_ATTACHED; > > >>> dev->flags &= ~DF_DONENOMATCH; > > >>> devadded(dev); > > >>> + DEVICE_AFTER_ATTACH(dev); > > >>> return (0); > > >>> } > > >>> > > >> > > >> Does device_get_softc() work before attach is completed? (I don't have > > >> time to go look in the code right now). If so, then a mutex initialized > > >> and acquired early in the driver's attach routine, and also acquired in > > >> the driver's cdev implementation routines before using any newbus > > >> functions other than device_get_softc(), would solve the problem without > > >> a driver api change that would make it harder to backport/MFC driver > > >> changes. > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > on the driver developers, because you cannot sleep under mutex. Changing > > > the mutex to the sleepable lock also does not byy you much, since > > > you need to somehow solve the issues with some cdevsw call waking up > > > thread sleeping into another cdevsw call, just for example. > > > > > > Singlethreading a driver due to this race is just silly. > > > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > > > driver_attach() > > { > > ... > > softc->flags = 0; // redundant, since softc is initialized to 0. > > softc->cdev = device_create...(); > > ... > > softc->flags |= READY; > > } > > > > driver_open(...) > > { > > if (!(softc->flags & READY)) > > return ENXIO; > > ... > > } > > > > What's the big burden here? > The burden is that your proposal does not work. As I described above, > device_busy() calls from cdevsw method panic if open() is called before > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > attach() method returned, so no workarounds from attach() could solve > this. One thing that keeps floating to the front of my brain is that all the proposals so far (including my not-well-thought-out mutex suggestion) requires changing every existing driver to get the new safe behavior. Hmmm. Looking at the code, not very many drivers call device_busy(). Why is that? I agree that calling device_create() should be deferred until the driver is ready to handle requests. That's only part of the fix if the newbus support routines are still going to have a window where they can panic because the internal state variables haven't yet transitioned to the correct state. Also, the implementation of device_busy() looks to be unsafe unless it's being implicitly protected by some locking in a call chain that isn't jumping out at me with simple grepping of the code. For example, concurrent callers in a device's open() and close() methods for a driver that calls busy/unbusy from cdev open/close could leave the parent device's busy count in an indeterminate state. Could fixing this by enforcing single threading through busy/unbusy provide an opportunity to fix the original problem by having the attach code acquire the same lock, so that an early call to a cdev method that invokes device_busy() ends up sleeping until the attach routine returns? That way you don't single-thread the whole cdev open/close handling, just the part that's currently causing a problem. -- Ian From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 14:59:39 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B4698106564A; Mon, 9 Apr 2012 14:59:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0930B8FC14; Mon, 9 Apr 2012 14:59:38 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q39ExFE3063752; Mon, 9 Apr 2012 17:59:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q39ExEP9050552; Mon, 9 Apr 2012 17:59:14 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q39ExDuv050551; Mon, 9 Apr 2012 17:59:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Apr 2012 17:59:13 +0300 From: Konstantin Belousov To: Ian Lepore Message-ID: <20120409145913.GR2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <20120408035838.GY2358@deviant.kiev.zoral.com.ua> <1333982475.1082.61.camel@revolution.hippie.lan> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="aLNv5lbzIJDYDUXB" Content-Disposition: inline In-Reply-To: <1333982475.1082.61.camel@revolution.hippie.lan> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 14:59:39 -0000 --aLNv5lbzIJDYDUXB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote: > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > >=20 > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > >=20 > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > >>> Hello, > > > >>> there seems to be a problem with device attach sequence offered b= y newbus. > > > >>> Basically, when device attach method is executing, device is not = fully > > > >>> initialized yet. Also the device state in the newbus part of the = world > > > >>> is DS_ALIVE. There is definitely no shattering news in the statem= ents, > > > >>> but drivers that e.g. create devfs node to communicate with consu= mers > > > >>> are prone to a race. > > > >>>=20 > > > >>> If /dev node is created inside device attach method, then usermode > > > >>> can start calling cdevsw methods before device fully initialized = itself. > > > >>> Even more, if device tries to use newbus helpers in cdevsw method= s, > > > >>> like device_busy(9), then panic occurs "called for unatteched dev= ice". > > > >>> I get reports from users about this issues, to it is not something > > > >>> that only could happen. > > > >>>=20 > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > >>> from newbus right after device attach finished and newbus conside= rs > > > >>> the device fully initialized. Driver then could create devfs node > > > >>> in the after_attach method instead of attach. Please see the patc= h below. > > > >>>=20 > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > >>> index eb720eb..9db74e2 100644 > > > >>> --- a/sys/kern/device_if.m > > > >>> +++ b/sys/kern/device_if.m > > > >>> @@ -43,6 +43,10 @@ INTERFACE device; > > > >>> # Default implementations of some methods. > > > >>> # > > > >>> CODE { > > > >>> + static void null_after_attach(device_t dev) > > > >>> + { > > > >>> + } > > > >>> + > > > >>> static int null_shutdown(device_t dev) > > > >>> { > > > >>> return 0; > > > >>> @@ -199,6 +203,21 @@ METHOD int attach { > > > >>> }; > > > >>>=20 > > > >>> /** > > > >>> + * @brief Notify the driver that device is in attached state > > > >>> + * > > > >>> + * Called after driver is successfully attached to the device and > > > >>> + * corresponding device_t is fully operational. Driver now may e= xpose > > > >>> + * the device to the consumers, e.g. create devfs nodes. > > > >>> + * > > > >>> + * @param dev the device to probe > > > >>> + * > > > >>> + * @see DEVICE_ATTACH() > > > >>> + */ > > > >>> +METHOD void after_attach { > > > >>> + device_t dev; > > > >>> +} DEFAULT null_after_attach; > > > >>> + > > > >>> +/** > > > >>> * @brief Detach a driver from a device. > > > >>> * > > > >>> * This can be called if the user is replacing the > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > >>> index d485b9f..6d849cb 100644 > > > >>> --- a/sys/kern/subr_bus.c > > > >>> +++ b/sys/kern/subr_bus.c > > > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > >>> dev->state =3D DS_ATTACHED; > > > >>> dev->flags &=3D ~DF_DONENOMATCH; > > > >>> devadded(dev); > > > >>> + DEVICE_AFTER_ATTACH(dev); > > > >>> return (0); > > > >>> } > > > >>>=20 > > > >>=20 > > > >> Does device_get_softc() work before attach is completed? (I don't= have > > > >> time to go look in the code right now). If so, then a mutex initi= alized > > > >> and acquired early in the driver's attach routine, and also acquir= ed in > > > >> the driver's cdev implementation routines before using any newbus > > > >> functions other than device_get_softc(), would solve the problem w= ithout > > > >> a driver api change that would make it harder to backport/MFC driv= er > > > >> changes. > > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > > on the driver developers, because you cannot sleep under mutex. Cha= nging > > > > the mutex to the sleepable lock also does not byy you much, since > > > > you need to somehow solve the issues with some cdevsw call waking up > > > > thread sleeping into another cdevsw call, just for example. > > > >=20 > > > > Singlethreading a driver due to this race is just silly. > > > >=20 > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > >=20 > > > driver_attach() > > > { > > > ... > > > softc->flags =3D 0; // redundant, since softc is initialized to 0. > > > softc->cdev =3D device_create...(); > > > ... > > > softc->flags |=3D READY; > > > } > > >=20 > > > driver_open(...) > > > { > > > if (!(softc->flags & READY)) > > > return ENXIO; > > > ... > > > } > > >=20 > > > What's the big burden here? > > The burden is that your proposal does not work. As I described above, > > device_busy() calls from cdevsw method panic if open() is called before > > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > > attach() method returned, so no workarounds from attach() could solve > > this. >=20 > One thing that keeps floating to the front of my brain is that all the > proposals so far (including my not-well-thought-out mutex suggestion) > requires changing every existing driver to get the new safe behavior. >=20 > Hmmm. Looking at the code, not very many drivers call device_busy(). > Why is that? =20 >=20 > I agree that calling device_create() should be deferred until the driver You mean make_dev(9), I assume. > is ready to handle requests. That's only part of the fix if the newbus > support routines are still going to have a window where they can panic > because the internal state variables haven't yet transitioned to the > correct state. =20 >=20 > Also, the implementation of device_busy() looks to be unsafe unless it's > being implicitly protected by some locking in a call chain that isn't > jumping out at me with simple grepping of the code. For example, > concurrent callers in a device's open() and close() methods for a driver > that calls busy/unbusy from cdev open/close could leave the parent > device's busy count in an indeterminate state. Could fixing this by > enforcing single threading through busy/unbusy provide an opportunity to > fix the original problem by having the attach code acquire the same > lock, so that an early call to a cdev method that invokes device_busy() > ends up sleeping until the attach routine returns? That way you don't > single-thread the whole cdev open/close handling, just the part that's > currently causing a problem. I think device_busy() calls require Giant, the same as the whole newbus. The absence of GIANT_REQUIRED assertions in device_busy() and device_unbusy= () looks like overlook. --aLNv5lbzIJDYDUXB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+C+UEACgkQC3+MBN1Mb4jyyQCfVT7lHgGeiLMsjUFMCQIgzM9e 6lMAniYIyja/0aayMj78aYDJgAfpU6WC =HEjq -----END PGP SIGNATURE----- --aLNv5lbzIJDYDUXB-- From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 15:01:20 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F0721065688; Mon, 9 Apr 2012 15:01:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 022678FC08; Mon, 9 Apr 2012 15:01:20 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 40F82B911; Mon, 9 Apr 2012 11:01:19 -0400 (EDT) From: John Baldwin To: freebsd-drivers@freebsd.org Date: Mon, 9 Apr 2012 11:01:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> In-Reply-To: <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201204091101.03956.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Apr 2012 11:01:19 -0400 (EDT) Cc: Ian Lepore , current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 15:01:20 -0000 On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > >> Hello, > >> there seems to be a problem with device attach sequence offered by newbus. > >> Basically, when device attach method is executing, device is not fully > >> initialized yet. Also the device state in the newbus part of the world > >> is DS_ALIVE. There is definitely no shattering news in the statements, > >> but drivers that e.g. create devfs node to communicate with consumers > >> are prone to a race. > >> > >> If /dev node is created inside device attach method, then usermode > >> can start calling cdevsw methods before device fully initialized itself. > >> Even more, if device tries to use newbus helpers in cdevsw methods, > >> like device_busy(9), then panic occurs "called for unatteched device". > >> I get reports from users about this issues, to it is not something > >> that only could happen. > >> > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > >> from newbus right after device attach finished and newbus considers > >> the device fully initialized. Driver then could create devfs node > >> in the after_attach method instead of attach. Please see the patch below. > >> > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > >> index eb720eb..9db74e2 100644 > >> --- a/sys/kern/device_if.m > >> +++ b/sys/kern/device_if.m > >> @@ -43,6 +43,10 @@ INTERFACE device; > >> # Default implementations of some methods. > >> # > >> CODE { > >> + static void null_after_attach(device_t dev) > >> + { > >> + } > >> + > >> static int null_shutdown(device_t dev) > >> { > >> return 0; > >> @@ -199,6 +203,21 @@ METHOD int attach { > >> }; > >> > >> /** > >> + * @brief Notify the driver that device is in attached state > >> + * > >> + * Called after driver is successfully attached to the device and > >> + * corresponding device_t is fully operational. Driver now may expose > >> + * the device to the consumers, e.g. create devfs nodes. > >> + * > >> + * @param dev the device to probe > >> + * > >> + * @see DEVICE_ATTACH() > >> + */ > >> +METHOD void after_attach { > >> + device_t dev; > >> +} DEFAULT null_after_attach; > >> + > >> +/** > >> * @brief Detach a driver from a device. > >> * > >> * This can be called if the user is replacing the > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > >> index d485b9f..6d849cb 100644 > >> --- a/sys/kern/subr_bus.c > >> +++ b/sys/kern/subr_bus.c > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > >> dev->state = DS_ATTACHED; > >> dev->flags &= ~DF_DONENOMATCH; > >> devadded(dev); > >> + DEVICE_AFTER_ATTACH(dev); > >> return (0); > >> } > >> > > > > Does device_get_softc() work before attach is completed? (I don't have > > time to go look in the code right now). If so, then a mutex initialized > > and acquired early in the driver's attach routine, and also acquired in > > the driver's cdev implementation routines before using any newbus > > functions other than device_get_softc(), would solve the problem without > > a driver api change that would make it harder to backport/MFC driver > > changes. > > Also, more generally, don't create the dev nodes before you are ready to deal with requests. Why do we need to uglify everything here? If you can't do that, you can check a bit in the softc and return EBUSY or ENXIO on open if that bit says that your driver isn't ready to accept requests. I agree, this dosen't actually fix anything as the decision for what to put in your foo_attach() method rather than foo_after_attach() is non-obvious and very arbitrary. The actual bug appears to only be with using 'device_busy()'. I think this should be fixed by making device_busy() better, not by adding this type of obfuscation to attach. It should be trivial to make device_busy() safe to use on a device that is currently being attached which will not require any changes to drivers. -- John Baldwin From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 15:01:20 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F0721065688; Mon, 9 Apr 2012 15:01:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 022678FC08; Mon, 9 Apr 2012 15:01:20 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 40F82B911; Mon, 9 Apr 2012 11:01:19 -0400 (EDT) From: John Baldwin To: freebsd-drivers@freebsd.org Date: Mon, 9 Apr 2012 11:01:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> In-Reply-To: <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201204091101.03956.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Apr 2012 11:01:19 -0400 (EDT) Cc: Ian Lepore , current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 15:01:20 -0000 On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > >> Hello, > >> there seems to be a problem with device attach sequence offered by newbus. > >> Basically, when device attach method is executing, device is not fully > >> initialized yet. Also the device state in the newbus part of the world > >> is DS_ALIVE. There is definitely no shattering news in the statements, > >> but drivers that e.g. create devfs node to communicate with consumers > >> are prone to a race. > >> > >> If /dev node is created inside device attach method, then usermode > >> can start calling cdevsw methods before device fully initialized itself. > >> Even more, if device tries to use newbus helpers in cdevsw methods, > >> like device_busy(9), then panic occurs "called for unatteched device". > >> I get reports from users about this issues, to it is not something > >> that only could happen. > >> > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > >> from newbus right after device attach finished and newbus considers > >> the device fully initialized. Driver then could create devfs node > >> in the after_attach method instead of attach. Please see the patch below. > >> > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > >> index eb720eb..9db74e2 100644 > >> --- a/sys/kern/device_if.m > >> +++ b/sys/kern/device_if.m > >> @@ -43,6 +43,10 @@ INTERFACE device; > >> # Default implementations of some methods. > >> # > >> CODE { > >> + static void null_after_attach(device_t dev) > >> + { > >> + } > >> + > >> static int null_shutdown(device_t dev) > >> { > >> return 0; > >> @@ -199,6 +203,21 @@ METHOD int attach { > >> }; > >> > >> /** > >> + * @brief Notify the driver that device is in attached state > >> + * > >> + * Called after driver is successfully attached to the device and > >> + * corresponding device_t is fully operational. Driver now may expose > >> + * the device to the consumers, e.g. create devfs nodes. > >> + * > >> + * @param dev the device to probe > >> + * > >> + * @see DEVICE_ATTACH() > >> + */ > >> +METHOD void after_attach { > >> + device_t dev; > >> +} DEFAULT null_after_attach; > >> + > >> +/** > >> * @brief Detach a driver from a device. > >> * > >> * This can be called if the user is replacing the > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > >> index d485b9f..6d849cb 100644 > >> --- a/sys/kern/subr_bus.c > >> +++ b/sys/kern/subr_bus.c > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > >> dev->state = DS_ATTACHED; > >> dev->flags &= ~DF_DONENOMATCH; > >> devadded(dev); > >> + DEVICE_AFTER_ATTACH(dev); > >> return (0); > >> } > >> > > > > Does device_get_softc() work before attach is completed? (I don't have > > time to go look in the code right now). If so, then a mutex initialized > > and acquired early in the driver's attach routine, and also acquired in > > the driver's cdev implementation routines before using any newbus > > functions other than device_get_softc(), would solve the problem without > > a driver api change that would make it harder to backport/MFC driver > > changes. > > Also, more generally, don't create the dev nodes before you are ready to deal with requests. Why do we need to uglify everything here? If you can't do that, you can check a bit in the softc and return EBUSY or ENXIO on open if that bit says that your driver isn't ready to accept requests. I agree, this dosen't actually fix anything as the decision for what to put in your foo_attach() method rather than foo_after_attach() is non-obvious and very arbitrary. The actual bug appears to only be with using 'device_busy()'. I think this should be fixed by making device_busy() better, not by adding this type of obfuscation to attach. It should be trivial to make device_busy() safe to use on a device that is currently being attached which will not require any changes to drivers. -- John Baldwin From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 15:43:27 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A848C1065675 for ; Mon, 9 Apr 2012 15:43:27 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta10.emeryville.ca.mail.comcast.net (qmta10.emeryville.ca.mail.comcast.net [76.96.30.17]) by mx1.freebsd.org (Postfix) with ESMTP id 4574C8FC0A for ; Mon, 9 Apr 2012 15:43:27 +0000 (UTC) Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19]) by qmta10.emeryville.ca.mail.comcast.net with comcast id vriM1i0010QkzPwAAriMen; Mon, 09 Apr 2012 15:42:21 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta02.emeryville.ca.mail.comcast.net with comcast id vriL1i0064NgCEG8NriLrS; Mon, 09 Apr 2012 15:42:21 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q39FgIbg057154; Mon, 9 Apr 2012 09:42:18 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20120409145913.GR2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <20120408035838.GY2358@deviant.kiev.zoral.com.ua> <1333982475.1082.61.camel@revolution.hippie.lan> <20120409145913.GR2358@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="us-ascii" Date: Mon, 09 Apr 2012 09:42:18 -0600 Message-ID: <1333986138.1082.71.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, drivers@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 15:43:27 -0000 On Mon, 2012-04-09 at 17:59 +0300, Konstantin Belousov wrote: > On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote: > > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: > > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: > > > > > > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: > > > > > > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: > > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > >>> Hello, > > > > >>> there seems to be a problem with device attach sequence offered by newbus. > > > > >>> Basically, when device attach method is executing, device is not fully > > > > >>> initialized yet. Also the device state in the newbus part of the world > > > > >>> is DS_ALIVE. There is definitely no shattering news in the statements, > > > > >>> but drivers that e.g. create devfs node to communicate with consumers > > > > >>> are prone to a race. > > > > >>> > > > > >>> If /dev node is created inside device attach method, then usermode > > > > >>> can start calling cdevsw methods before device fully initialized itself. > > > > >>> Even more, if device tries to use newbus helpers in cdevsw methods, > > > > >>> like device_busy(9), then panic occurs "called for unatteched device". > > > > >>> I get reports from users about this issues, to it is not something > > > > >>> that only could happen. > > > > >>> > > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > > >>> from newbus right after device attach finished and newbus considers > > > > >>> the device fully initialized. Driver then could create devfs node > > > > >>> in the after_attach method instead of attach. Please see the patch below. > > > > >>> > > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > >>> index eb720eb..9db74e2 100644 > > > > >>> --- a/sys/kern/device_if.m > > > > >>> +++ b/sys/kern/device_if.m > > > > >>> @@ -43,6 +43,10 @@ INTERFACE device; > > > > >>> # Default implementations of some methods. > > > > >>> # > > > > >>> CODE { > > > > >>> + static void null_after_attach(device_t dev) > > > > >>> + { > > > > >>> + } > > > > >>> + > > > > >>> static int null_shutdown(device_t dev) > > > > >>> { > > > > >>> return 0; > > > > >>> @@ -199,6 +203,21 @@ METHOD int attach { > > > > >>> }; > > > > >>> > > > > >>> /** > > > > >>> + * @brief Notify the driver that device is in attached state > > > > >>> + * > > > > >>> + * Called after driver is successfully attached to the device and > > > > >>> + * corresponding device_t is fully operational. Driver now may expose > > > > >>> + * the device to the consumers, e.g. create devfs nodes. > > > > >>> + * > > > > >>> + * @param dev the device to probe > > > > >>> + * > > > > >>> + * @see DEVICE_ATTACH() > > > > >>> + */ > > > > >>> +METHOD void after_attach { > > > > >>> + device_t dev; > > > > >>> +} DEFAULT null_after_attach; > > > > >>> + > > > > >>> +/** > > > > >>> * @brief Detach a driver from a device. > > > > >>> * > > > > >>> * This can be called if the user is replacing the > > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > >>> index d485b9f..6d849cb 100644 > > > > >>> --- a/sys/kern/subr_bus.c > > > > >>> +++ b/sys/kern/subr_bus.c > > > > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > >>> dev->state = DS_ATTACHED; > > > > >>> dev->flags &= ~DF_DONENOMATCH; > > > > >>> devadded(dev); > > > > >>> + DEVICE_AFTER_ATTACH(dev); > > > > >>> return (0); > > > > >>> } > > > > >>> > > > > >> > > > > >> Does device_get_softc() work before attach is completed? (I don't have > > > > >> time to go look in the code right now). If so, then a mutex initialized > > > > >> and acquired early in the driver's attach routine, and also acquired in > > > > >> the driver's cdev implementation routines before using any newbus > > > > >> functions other than device_get_softc(), would solve the problem without > > > > >> a driver api change that would make it harder to backport/MFC driver > > > > >> changes. > > > > > No, 'a mutex' does not solve anything. It only adds enourmous burden > > > > > on the driver developers, because you cannot sleep under mutex. Changing > > > > > the mutex to the sleepable lock also does not byy you much, since > > > > > you need to somehow solve the issues with some cdevsw call waking up > > > > > thread sleeping into another cdevsw call, just for example. > > > > > > > > > > Singlethreading a driver due to this race is just silly. > > > > > > > > > > And, what do you mean by 'making it harder to MFC' ? How ? > > > > > > > > driver_attach() > > > > { > > > > ... > > > > softc->flags = 0; // redundant, since softc is initialized to 0. > > > > softc->cdev = device_create...(); > > > > ... > > > > softc->flags |= READY; > > > > } > > > > > > > > driver_open(...) > > > > { > > > > if (!(softc->flags & READY)) > > > > return ENXIO; > > > > ... > > > > } > > > > > > > > What's the big burden here? > > > The burden is that your proposal does not work. As I described above, > > > device_busy() calls from cdevsw method panic if open() is called before > > > DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after device > > > attach() method returned, so no workarounds from attach() could solve > > > this. > > > > One thing that keeps floating to the front of my brain is that all the > > proposals so far (including my not-well-thought-out mutex suggestion) > > requires changing every existing driver to get the new safe behavior. > > > > Hmmm. Looking at the code, not very many drivers call device_busy(). > > Why is that? > > > > I agree that calling device_create() should be deferred until the driver > You mean make_dev(9), I assume. > Yes, of course, sorry (I was looking at a call to a local convenience routine in a private driver). > > is ready to handle requests. That's only part of the fix if the newbus > > support routines are still going to have a window where they can panic > > because the internal state variables haven't yet transitioned to the > > correct state. > > > > Also, the implementation of device_busy() looks to be unsafe unless it's > > being implicitly protected by some locking in a call chain that isn't > > jumping out at me with simple grepping of the code. For example, > > concurrent callers in a device's open() and close() methods for a driver > > that calls busy/unbusy from cdev open/close could leave the parent > > device's busy count in an indeterminate state. Could fixing this by > > enforcing single threading through busy/unbusy provide an opportunity to > > fix the original problem by having the attach code acquire the same > > lock, so that an early call to a cdev method that invokes device_busy() > > ends up sleeping until the attach routine returns? That way you don't > > single-thread the whole cdev open/close handling, just the part that's > > currently causing a problem. > I think device_busy() calls require Giant, the same as the whole newbus. > The absence of GIANT_REQUIRED assertions in device_busy() and device_unbusy() > looks like overlook. Hmmm. Shouldn't device_attach() then require Giant as well? I see that device_detach() and device_probe_and_attach() contains GIANT_REQUIRED but device_attach() does not. Of the few devices calling device_busy(), some use D_NEEDGIANT and some do not (the do-nots include drm, fdc, rp -- I stopped looking at this point). Only the fdc code specifically calls mtx_lock(&Giant), but it appears that it does not do so around the device_busy/unbusy() calls. If you're supposed to hold Giant during calls to device_busy/unbusy, and assuming it should be held by whomever calls device_attach(), wouldn't that fix the panic? I don't know if that's a good assumption, because I'm not seeing anything in the manpages or architecture handbook about holding Giant during newbus calls. -- Ian From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 16:04:29 2012 Return-Path: Delivered-To: drivers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D649C106566C; Mon, 9 Apr 2012 16:04:29 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 62B7C8FC08; Mon, 9 Apr 2012 16:04:29 +0000 (UTC) Received: from [10.30.101.53] ([209.117.142.2]) (authenticated bits=0) by harmony.bsdimp.com (8.14.4/8.14.3) with ESMTP id q39FwJbv093309 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES128-SHA bits=128 verify=NO); Mon, 9 Apr 2012 09:58:21 -0600 (MDT) (envelope-from imp@bsdimp.com) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <1333986138.1082.71.camel@revolution.hippie.lan> Date: Mon, 9 Apr 2012 09:58:14 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <20120407145728.GV2358@deviant.kiev.zoral.com.ua> <20120408035838.GY2358@deviant.kiev.zoral.com.ua> <1333982475.1082.61.camel@revolution.hippie.lan> <20120409145913.GR2358@deviant.kiev.zoral.com.ua> <1333986138.1082.71.camel@revolution.hippie.lan> To: Ian Lepore X-Mailer: Apple Mail (2.1084) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (harmony.bsdimp.com [10.0.0.6]); Mon, 09 Apr 2012 09:58:22 -0600 (MDT) Cc: current@FreeBSD.org, drivers@FreeBSD.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 16:04:29 -0000 On Apr 9, 2012, at 9:42 AM, Ian Lepore wrote: > On Mon, 2012-04-09 at 17:59 +0300, Konstantin Belousov wrote: >> On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote: >>> On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote: >>>> On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote: >>>>>=20 >>>>> On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote: >>>>>=20 >>>>>> On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote: >>>>>>> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: >>>>>>>> Hello, >>>>>>>> there seems to be a problem with device attach sequence offered = by newbus. >>>>>>>> Basically, when device attach method is executing, device is = not fully >>>>>>>> initialized yet. Also the device state in the newbus part of = the world >>>>>>>> is DS_ALIVE. There is definitely no shattering news in the = statements, >>>>>>>> but drivers that e.g. create devfs node to communicate with = consumers >>>>>>>> are prone to a race. >>>>>>>>=20 >>>>>>>> If /dev node is created inside device attach method, then = usermode >>>>>>>> can start calling cdevsw methods before device fully = initialized itself. >>>>>>>> Even more, if device tries to use newbus helpers in cdevsw = methods, >>>>>>>> like device_busy(9), then panic occurs "called for unatteched = device". >>>>>>>> I get reports from users about this issues, to it is not = something >>>>>>>> that only could happen. >>>>>>>>=20 >>>>>>>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be = called >>>>>>>> from newbus right after device attach finished and newbus = considers >>>>>>>> the device fully initialized. Driver then could create devfs = node >>>>>>>> in the after_attach method instead of attach. Please see the = patch below. >>>>>>>>=20 >>>>>>>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m >>>>>>>> index eb720eb..9db74e2 100644 >>>>>>>> --- a/sys/kern/device_if.m >>>>>>>> +++ b/sys/kern/device_if.m >>>>>>>> @@ -43,6 +43,10 @@ INTERFACE device; >>>>>>>> # Default implementations of some methods. >>>>>>>> # >>>>>>>> CODE { >>>>>>>> + static void null_after_attach(device_t dev) >>>>>>>> + { >>>>>>>> + } >>>>>>>> + >>>>>>>> static int null_shutdown(device_t dev) >>>>>>>> { >>>>>>>> return 0; >>>>>>>> @@ -199,6 +203,21 @@ METHOD int attach { >>>>>>>> }; >>>>>>>>=20 >>>>>>>> /** >>>>>>>> + * @brief Notify the driver that device is in attached state >>>>>>>> + * >>>>>>>> + * Called after driver is successfully attached to the device = and >>>>>>>> + * corresponding device_t is fully operational. Driver now may = expose >>>>>>>> + * the device to the consumers, e.g. create devfs nodes. >>>>>>>> + * >>>>>>>> + * @param dev the device to probe >>>>>>>> + * >>>>>>>> + * @see DEVICE_ATTACH() >>>>>>>> + */ >>>>>>>> +METHOD void after_attach { >>>>>>>> + device_t dev; >>>>>>>> +} DEFAULT null_after_attach; >>>>>>>> + >>>>>>>> +/** >>>>>>>> * @brief Detach a driver from a device. >>>>>>>> * >>>>>>>> * This can be called if the user is replacing the >>>>>>>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c >>>>>>>> index d485b9f..6d849cb 100644 >>>>>>>> --- a/sys/kern/subr_bus.c >>>>>>>> +++ b/sys/kern/subr_bus.c >>>>>>>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) >>>>>>>> dev->state =3D DS_ATTACHED; >>>>>>>> dev->flags &=3D ~DF_DONENOMATCH; >>>>>>>> devadded(dev); >>>>>>>> + DEVICE_AFTER_ATTACH(dev); >>>>>>>> return (0); >>>>>>>> } >>>>>>>>=20 >>>>>>>=20 >>>>>>> Does device_get_softc() work before attach is completed? (I = don't have >>>>>>> time to go look in the code right now). If so, then a mutex = initialized >>>>>>> and acquired early in the driver's attach routine, and also = acquired in >>>>>>> the driver's cdev implementation routines before using any = newbus >>>>>>> functions other than device_get_softc(), would solve the problem = without >>>>>>> a driver api change that would make it harder to backport/MFC = driver >>>>>>> changes. >>>>>> No, 'a mutex' does not solve anything. It only adds enourmous = burden >>>>>> on the driver developers, because you cannot sleep under mutex. = Changing >>>>>> the mutex to the sleepable lock also does not byy you much, since >>>>>> you need to somehow solve the issues with some cdevsw call waking = up >>>>>> thread sleeping into another cdevsw call, just for example. >>>>>>=20 >>>>>> Singlethreading a driver due to this race is just silly. >>>>>>=20 >>>>>> And, what do you mean by 'making it harder to MFC' ? How ? >>>>>=20 >>>>> driver_attach() >>>>> { >>>>> ... >>>>> softc->flags =3D 0; // redundant, since softc is initialized to = 0. >>>>> softc->cdev =3D device_create...(); >>>>> ... >>>>> softc->flags |=3D READY; >>>>> } >>>>>=20 >>>>> driver_open(...) >>>>> { >>>>> if (!(softc->flags & READY)) >>>>> return ENXIO; >>>>> ... >>>>> } >>>>>=20 >>>>> What's the big burden here? >>>> The burden is that your proposal does not work. As I described = above, >>>> device_busy() calls from cdevsw method panic if open() is called = before >>>> DS_ATTACHED is set by newbus. And, DS_ATTACHED is only set after = device >>>> attach() method returned, so no workarounds from attach() could = solve >>>> this. >>>=20 >>> One thing that keeps floating to the front of my brain is that all = the >>> proposals so far (including my not-well-thought-out mutex = suggestion) >>> requires changing every existing driver to get the new safe = behavior. >>>=20 >>> Hmmm. Looking at the code, not very many drivers call = device_busy(). >>> Why is that? =20 >>>=20 >>> I agree that calling device_create() should be deferred until the = driver >> You mean make_dev(9), I assume. >>=20 >=20 > Yes, of course, sorry (I was looking at a call to a local convenience > routine in a private driver). >=20 >>> is ready to handle requests. That's only part of the fix if the = newbus >>> support routines are still going to have a window where they can = panic >>> because the internal state variables haven't yet transitioned to the >>> correct state. =20 >>>=20 >>> Also, the implementation of device_busy() looks to be unsafe unless = it's >>> being implicitly protected by some locking in a call chain that = isn't >>> jumping out at me with simple grepping of the code. For example, >>> concurrent callers in a device's open() and close() methods for a = driver >>> that calls busy/unbusy from cdev open/close could leave the parent >>> device's busy count in an indeterminate state. Could fixing this by >>> enforcing single threading through busy/unbusy provide an = opportunity to >>> fix the original problem by having the attach code acquire the same >>> lock, so that an early call to a cdev method that invokes = device_busy() >>> ends up sleeping until the attach routine returns? That way you = don't >>> single-thread the whole cdev open/close handling, just the part = that's >>> currently causing a problem. >> I think device_busy() calls require Giant, the same as the whole = newbus. >> The absence of GIANT_REQUIRED assertions in device_busy() and = device_unbusy() >> looks like overlook. >=20 > Hmmm. Shouldn't device_attach() then require Giant as well? I see = that > device_detach() and device_probe_and_attach() contains GIANT_REQUIRED > but device_attach() does not. =20 They both require Giant because the device tree is only locked via = Giant. Pretty much all newbus functions require Giant to be held when = you call them. This is poorly enforced in the code, however. And also = poorly documented. The reasons this wasn't well documented was the = promise of newbus locking that never made it into the tree, but also = blocked efforts to improve the documentation here, as well as adding = additional needs giant stuff. > Of the few devices calling device_busy(), some use D_NEEDGIANT and = some > do not (the do-nots include drm, fdc, rp -- I stopped looking at this > point). Only the fdc code specifically calls mtx_lock(&Giant), but it > appears that it does not do so around the device_busy/unbusy() calls. >=20 > If you're supposed to hold Giant during calls to device_busy/unbusy, = and > assuming it should be held by whomever calls device_attach(), wouldn't > that fix the panic? I don't know if that's a good assumption, because > I'm not seeing anything in the manpages or architecture handbook about > holding Giant during newbus calls Since we're recursively traveling up the tree, we need some topology = lock in order to ensure tree consistency. Giant is that lock, for = better or worse. At one point, and I'm not sure if this is still true, = the rule was either Giant being held, or interrupts haven't been enabled = yet. Now that we enable interrupts earlier, I think this has become = 'Giant has to be held'. That's one reason I rarely use device_busy and prefer to use other = methods of blocking unload and early entrance to cdevsw in drivers I've = written that I care about these cases. Warner= From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 19:10:14 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CEE6E106564A; Mon, 9 Apr 2012 19:10:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0EC598FC08; Mon, 9 Apr 2012 19:10:13 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q39JA2Zj015198; Mon, 9 Apr 2012 22:10:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q39JA1SR051583; Mon, 9 Apr 2012 22:10:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q39JA0mH051582; Mon, 9 Apr 2012 22:10:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Apr 2012 22:10:00 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120409191000.GS2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> <201204091101.03956.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pC/tHwOGf8AW2ISZ" Content-Disposition: inline In-Reply-To: <201204091101.03956.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , drivers@freebsd.org, freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 19:10:14 -0000 --pC/tHwOGf8AW2ISZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > >=20 > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > >=20 > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > >> Hello, > > >> there seems to be a problem with device attach sequence offered by= =20 > newbus. > > >> Basically, when device attach method is executing, device is not ful= ly > > >> initialized yet. Also the device state in the newbus part of the wor= ld > > >> is DS_ALIVE. There is definitely no shattering news in the statement= s, > > >> but drivers that e.g. create devfs node to communicate with consumers > > >> are prone to a race. > > >>=20 > > >> If /dev node is created inside device attach method, then usermode > > >> can start calling cdevsw methods before device fully initialized its= elf. > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > >> like device_busy(9), then panic occurs "called for unatteched device= ". > > >> I get reports from users about this issues, to it is not something > > >> that only could happen. > > >>=20 > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > >> from newbus right after device attach finished and newbus considers > > >> the device fully initialized. Driver then could create devfs node > > >> in the after_attach method instead of attach. Please see the patch b= elow. > > >>=20 > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > >> index eb720eb..9db74e2 100644 > > >> --- a/sys/kern/device_if.m > > >> +++ b/sys/kern/device_if.m > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > >> # Default implementations of some methods. > > >> # > > >> CODE { > > >> + static void null_after_attach(device_t dev) > > >> + { > > >> + } > > >> + > > >> static int null_shutdown(device_t dev) > > >> { > > >> return 0; > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > >> }; > > >>=20 > > >> /** > > >> + * @brief Notify the driver that device is in attached state > > >> + * > > >> + * Called after driver is successfully attached to the device and > > >> + * corresponding device_t is fully operational. Driver now may expo= se > > >> + * the device to the consumers, e.g. create devfs nodes. > > >> + * > > >> + * @param dev the device to probe > > >> + * > > >> + * @see DEVICE_ATTACH() > > >> + */ > > >> +METHOD void after_attach { > > >> + device_t dev; > > >> +} DEFAULT null_after_attach; > > >> + > > >> +/** > > >> * @brief Detach a driver from a device. > > >> * > > >> * This can be called if the user is replacing the > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > >> index d485b9f..6d849cb 100644 > > >> --- a/sys/kern/subr_bus.c > > >> +++ b/sys/kern/subr_bus.c > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > >> dev->state =3D DS_ATTACHED; > > >> dev->flags &=3D ~DF_DONENOMATCH; > > >> devadded(dev); > > >> + DEVICE_AFTER_ATTACH(dev); > > >> return (0); > > >> } > > >>=20 > > >=20 > > > Does device_get_softc() work before attach is completed? (I don't ha= ve > > > time to go look in the code right now). If so, then a mutex initiali= zed > > > and acquired early in the driver's attach routine, and also acquired = in > > > the driver's cdev implementation routines before using any newbus > > > functions other than device_get_softc(), would solve the problem with= out > > > a driver api change that would make it harder to backport/MFC driver > > > changes. > >=20 > > Also, more generally, don't create the dev nodes before you are ready t= o=20 > deal with requests. Why do we need to uglify everything here? If you ca= n't=20 > do that, you can check a bit in the softc and return EBUSY or ENXIO on op= en if=20 > that bit says that your driver isn't ready to accept requests. >=20 > I agree, this dosen't actually fix anything as the decision for what to p= ut > in your foo_attach() method rather than foo_after_attach() is non-obvious= and=20 > very arbitrary. >=20 > The actual bug appears to only be with using 'device_busy()'. I think > this should be fixed by making device_busy() better, not by adding > this type of obfuscation to attach. It should be trivial to make > device_busy() safe to use on a device that is currently being attached > which will not require any changes to drivers. Could you, please, elaborate your proposal ? How do you think device_busy() can be enchanced ? Obvious idea to sleep inside device_busy() until dev->state becomes !=3D DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks if device_attach() method needs to call destroy_dev() to rollback. Pointing driver authors to destroy_dev_sched() for this purpose is overkill, since average driver author, me included, could not use destroy_dev_sched() properly, at least without lot of works. --pC/tHwOGf8AW2ISZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+DNAgACgkQC3+MBN1Mb4iFywCg7rSjfbyQaBeVRxgyn5c23OJf tSQAn1RNLlJnVXxklIrOxfJrXFaiSyk7 =I1pD -----END PGP SIGNATURE----- --pC/tHwOGf8AW2ISZ-- From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 19:10:14 2012 Return-Path: Delivered-To: drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CEE6E106564A; Mon, 9 Apr 2012 19:10:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0EC598FC08; Mon, 9 Apr 2012 19:10:13 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q39JA2Zj015198; Mon, 9 Apr 2012 22:10:02 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q39JA1SR051583; Mon, 9 Apr 2012 22:10:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q39JA0mH051582; Mon, 9 Apr 2012 22:10:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Apr 2012 22:10:00 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120409191000.GS2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <1333810001.1082.36.camel@revolution.hippie.lan> <7C5089DC-AB9B-458F-A606-B432505BD961@bsdimp.com> <201204091101.03956.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pC/tHwOGf8AW2ISZ" Content-Disposition: inline In-Reply-To: <201204091101.03956.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , drivers@freebsd.org, freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 19:10:14 -0000 --pC/tHwOGf8AW2ISZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > >=20 > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > >=20 > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > >> Hello, > > >> there seems to be a problem with device attach sequence offered by= =20 > newbus. > > >> Basically, when device attach method is executing, device is not ful= ly > > >> initialized yet. Also the device state in the newbus part of the wor= ld > > >> is DS_ALIVE. There is definitely no shattering news in the statement= s, > > >> but drivers that e.g. create devfs node to communicate with consumers > > >> are prone to a race. > > >>=20 > > >> If /dev node is created inside device attach method, then usermode > > >> can start calling cdevsw methods before device fully initialized its= elf. > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > >> like device_busy(9), then panic occurs "called for unatteched device= ". > > >> I get reports from users about this issues, to it is not something > > >> that only could happen. > > >>=20 > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > >> from newbus right after device attach finished and newbus considers > > >> the device fully initialized. Driver then could create devfs node > > >> in the after_attach method instead of attach. Please see the patch b= elow. > > >>=20 > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > >> index eb720eb..9db74e2 100644 > > >> --- a/sys/kern/device_if.m > > >> +++ b/sys/kern/device_if.m > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > >> # Default implementations of some methods. > > >> # > > >> CODE { > > >> + static void null_after_attach(device_t dev) > > >> + { > > >> + } > > >> + > > >> static int null_shutdown(device_t dev) > > >> { > > >> return 0; > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > >> }; > > >>=20 > > >> /** > > >> + * @brief Notify the driver that device is in attached state > > >> + * > > >> + * Called after driver is successfully attached to the device and > > >> + * corresponding device_t is fully operational. Driver now may expo= se > > >> + * the device to the consumers, e.g. create devfs nodes. > > >> + * > > >> + * @param dev the device to probe > > >> + * > > >> + * @see DEVICE_ATTACH() > > >> + */ > > >> +METHOD void after_attach { > > >> + device_t dev; > > >> +} DEFAULT null_after_attach; > > >> + > > >> +/** > > >> * @brief Detach a driver from a device. > > >> * > > >> * This can be called if the user is replacing the > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > >> index d485b9f..6d849cb 100644 > > >> --- a/sys/kern/subr_bus.c > > >> +++ b/sys/kern/subr_bus.c > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > >> dev->state =3D DS_ATTACHED; > > >> dev->flags &=3D ~DF_DONENOMATCH; > > >> devadded(dev); > > >> + DEVICE_AFTER_ATTACH(dev); > > >> return (0); > > >> } > > >>=20 > > >=20 > > > Does device_get_softc() work before attach is completed? (I don't ha= ve > > > time to go look in the code right now). If so, then a mutex initiali= zed > > > and acquired early in the driver's attach routine, and also acquired = in > > > the driver's cdev implementation routines before using any newbus > > > functions other than device_get_softc(), would solve the problem with= out > > > a driver api change that would make it harder to backport/MFC driver > > > changes. > >=20 > > Also, more generally, don't create the dev nodes before you are ready t= o=20 > deal with requests. Why do we need to uglify everything here? If you ca= n't=20 > do that, you can check a bit in the softc and return EBUSY or ENXIO on op= en if=20 > that bit says that your driver isn't ready to accept requests. >=20 > I agree, this dosen't actually fix anything as the decision for what to p= ut > in your foo_attach() method rather than foo_after_attach() is non-obvious= and=20 > very arbitrary. >=20 > The actual bug appears to only be with using 'device_busy()'. I think > this should be fixed by making device_busy() better, not by adding > this type of obfuscation to attach. It should be trivial to make > device_busy() safe to use on a device that is currently being attached > which will not require any changes to drivers. Could you, please, elaborate your proposal ? How do you think device_busy() can be enchanced ? Obvious idea to sleep inside device_busy() until dev->state becomes !=3D DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks if device_attach() method needs to call destroy_dev() to rollback. Pointing driver authors to destroy_dev_sched() for this purpose is overkill, since average driver author, me included, could not use destroy_dev_sched() properly, at least without lot of works. --pC/tHwOGf8AW2ISZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+DNAgACgkQC3+MBN1Mb4iFywCg7rSjfbyQaBeVRxgyn5c23OJf tSQAn1RNLlJnVXxklIrOxfJrXFaiSyk7 =I1pD -----END PGP SIGNATURE----- --pC/tHwOGf8AW2ISZ-- From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 19:36:09 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B37C8106566C; Mon, 9 Apr 2012 19:36:09 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 75A368FC0A; Mon, 9 Apr 2012 19:36:09 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E25FDB95F; Mon, 9 Apr 2012 15:36:08 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Date: Mon, 9 Apr 2012 15:36:08 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204091101.03956.jhb@freebsd.org> <20120409191000.GS2358@deviant.kiev.zoral.com.ua> In-Reply-To: <20120409191000.GS2358@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201204091536.08399.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Apr 2012 15:36:09 -0400 (EDT) Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 19:36:09 -0000 On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > >> Hello, > > > >> there seems to be a problem with device attach sequence offered by > > newbus. > > > >> Basically, when device attach method is executing, device is not fully > > > >> initialized yet. Also the device state in the newbus part of the world > > > >> is DS_ALIVE. There is definitely no shattering news in the statements, > > > >> but drivers that e.g. create devfs node to communicate with consumers > > > >> are prone to a race. > > > >> > > > >> If /dev node is created inside device attach method, then usermode > > > >> can start calling cdevsw methods before device fully initialized itself. > > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > > >> like device_busy(9), then panic occurs "called for unatteched device". > > > >> I get reports from users about this issues, to it is not something > > > >> that only could happen. > > > >> > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > >> from newbus right after device attach finished and newbus considers > > > >> the device fully initialized. Driver then could create devfs node > > > >> in the after_attach method instead of attach. Please see the patch below. > > > >> > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > >> index eb720eb..9db74e2 100644 > > > >> --- a/sys/kern/device_if.m > > > >> +++ b/sys/kern/device_if.m > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > >> # Default implementations of some methods. > > > >> # > > > >> CODE { > > > >> + static void null_after_attach(device_t dev) > > > >> + { > > > >> + } > > > >> + > > > >> static int null_shutdown(device_t dev) > > > >> { > > > >> return 0; > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > >> }; > > > >> > > > >> /** > > > >> + * @brief Notify the driver that device is in attached state > > > >> + * > > > >> + * Called after driver is successfully attached to the device and > > > >> + * corresponding device_t is fully operational. Driver now may expose > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > >> + * > > > >> + * @param dev the device to probe > > > >> + * > > > >> + * @see DEVICE_ATTACH() > > > >> + */ > > > >> +METHOD void after_attach { > > > >> + device_t dev; > > > >> +} DEFAULT null_after_attach; > > > >> + > > > >> +/** > > > >> * @brief Detach a driver from a device. > > > >> * > > > >> * This can be called if the user is replacing the > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > >> index d485b9f..6d849cb 100644 > > > >> --- a/sys/kern/subr_bus.c > > > >> +++ b/sys/kern/subr_bus.c > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > >> dev->state = DS_ATTACHED; > > > >> dev->flags &= ~DF_DONENOMATCH; > > > >> devadded(dev); > > > >> + DEVICE_AFTER_ATTACH(dev); > > > >> return (0); > > > >> } > > > >> > > > > > > > > Does device_get_softc() work before attach is completed? (I don't have > > > > time to go look in the code right now). If so, then a mutex initialized > > > > and acquired early in the driver's attach routine, and also acquired in > > > > the driver's cdev implementation routines before using any newbus > > > > functions other than device_get_softc(), would solve the problem without > > > > a driver api change that would make it harder to backport/MFC driver > > > > changes. > > > > > > Also, more generally, don't create the dev nodes before you are ready to > > deal with requests. Why do we need to uglify everything here? If you can't > > do that, you can check a bit in the softc and return EBUSY or ENXIO on open if > > that bit says that your driver isn't ready to accept requests. > > > > I agree, this dosen't actually fix anything as the decision for what to put > > in your foo_attach() method rather than foo_after_attach() is non-obvious and > > very arbitrary. > > > > The actual bug appears to only be with using 'device_busy()'. I think > > this should be fixed by making device_busy() better, not by adding > > this type of obfuscation to attach. It should be trivial to make > > device_busy() safe to use on a device that is currently being attached > > which will not require any changes to drivers. > > Could you, please, elaborate your proposal ? How do you think device_busy() > can be enchanced ? > > Obvious idea to sleep inside device_busy() until dev->state becomes != > DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks > if device_attach() method needs to call destroy_dev() to rollback. I think you could have a DS_ATTACHING state and allow device_busy() to work for DS_ATTACHING. The idea being that it is a bug for a driver to invoke device_busy() if it is going to fail attach. You may then need to do a fixup in device_attach() to promote the state from DS_ATTACHED to DS_BUSY when it returns if there is a non-zero busy count. Something like this: Index: kern/subr_bus.c =================================================================== --- kern/subr_bus.c (revision 234057) +++ kern/subr_bus.c (working copy) @@ -2472,12 +2472,13 @@ void device_busy(device_t dev) { - if (dev->state < DS_ATTACHED) + if (dev->state < DS_ATTACHING) panic("device_busy: called for unattached device"); if (dev->busy == 0 && dev->parent) device_busy(dev->parent); dev->busy++; - dev->state = DS_BUSY; + if (dev->state == DS_ATTACHED) + dev->state = DS_BUSY; } /** @@ -2486,14 +2487,16 @@ void device_unbusy(device_t dev) { - if (dev->state != DS_BUSY) + if (dev->busy != 0 && dev->state != DS_BUSY && + dev->state != DS_ATTACHING) panic("device_unbusy: called for non-busy device %s", device_get_nameunit(dev)); dev->busy--; if (dev->busy == 0) { if (dev->parent) device_unbusy(dev->parent); - dev->state = DS_ATTACHED; + if (dev->state == DS_BUSY) + dev->state = DS_ATTACHED; } } @@ -2729,6 +2732,7 @@ device_sysctl_init(dev); if (!device_is_quiet(dev)) device_print_child(dev->parent, dev); + dev->state = DS_ATTACHING; if ((error = DEVICE_ATTACH(dev)) != 0) { printf("device_attach: %s%d attach returned %d\n", dev->driver->name, dev->unit, error); @@ -2736,11 +2740,15 @@ devclass_delete_device(dev->devclass, dev); (void)device_set_driver(dev, NULL); device_sysctl_fini(dev); + KASSERT(dev->busy == 0, ("attach failed but busy")); dev->state = DS_NOTPRESENT; return (error); } device_sysctl_update(dev); - dev->state = DS_ATTACHED; + if (dev->busy) + dev->state = DS_BUSY; + else + dev->state = DS_ATTACHED; dev->flags &= ~DF_DONENOMATCH; devadded(dev); return (0); Index: sys/bus.h =================================================================== --- sys/bus.h (revision 234057) +++ sys/bus.h (working copy) @@ -52,6 +52,7 @@ typedef enum device_state { DS_NOTPRESENT = 10, /**< @brief not probed or probe failed */ DS_ALIVE = 20, /**< @brief probe succeeded */ + DS_ATTACHING = 25, /**< @brief currently attaching */ DS_ATTACHED = 30, /**< @brief attach method called */ DS_BUSY = 40 /**< @brief device is open */ } device_state_t; -- John Baldwin From owner-freebsd-drivers@FreeBSD.ORG Mon Apr 9 20:05:40 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 35A66106564A; Mon, 9 Apr 2012 20:05:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 7C8F28FC15; Mon, 9 Apr 2012 20:05:39 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q39K5Uw3026896; Mon, 9 Apr 2012 23:05:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q39K5Tpn051849; Mon, 9 Apr 2012 23:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q39K5Tc4051848; Mon, 9 Apr 2012 23:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Apr 2012 23:05:29 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120409200529.GT2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204091101.03956.jhb@freebsd.org> <20120409191000.GS2358@deviant.kiev.zoral.com.ua> <201204091536.08399.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3e+tHtSlmys++yUp" Content-Disposition: inline In-Reply-To: <201204091536.08399.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 20:05:40 -0000 --3e+tHtSlmys++yUp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > >=20 > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > >=20 > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > >> Hello, > > > > >> there seems to be a problem with device attach sequence offered = by=20 > > > newbus. > > > > >> Basically, when device attach method is executing, device is not= fully > > > > >> initialized yet. Also the device state in the newbus part of the= world > > > > >> is DS_ALIVE. There is definitely no shattering news in the state= ments, > > > > >> but drivers that e.g. create devfs node to communicate with cons= umers > > > > >> are prone to a race. > > > > >>=20 > > > > >> If /dev node is created inside device attach method, then usermo= de > > > > >> can start calling cdevsw methods before device fully initialized= itself. > > > > >> Even more, if device tries to use newbus helpers in cdevsw metho= ds, > > > > >> like device_busy(9), then panic occurs "called for unatteched de= vice". > > > > >> I get reports from users about this issues, to it is not somethi= ng > > > > >> that only could happen. > > > > >>=20 > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be call= ed > > > > >> from newbus right after device attach finished and newbus consid= ers > > > > >> the device fully initialized. Driver then could create devfs node > > > > >> in the after_attach method instead of attach. Please see the pat= ch below. > > > > >>=20 > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > >> index eb720eb..9db74e2 100644 > > > > >> --- a/sys/kern/device_if.m > > > > >> +++ b/sys/kern/device_if.m > > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > > >> # Default implementations of some methods. > > > > >> # > > > > >> CODE { > > > > >> + static void null_after_attach(device_t dev) > > > > >> + { > > > > >> + } > > > > >> + > > > > >> static int null_shutdown(device_t dev) > > > > >> { > > > > >> return 0; > > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > > >> }; > > > > >>=20 > > > > >> /** > > > > >> + * @brief Notify the driver that device is in attached state > > > > >> + * > > > > >> + * Called after driver is successfully attached to the device a= nd > > > > >> + * corresponding device_t is fully operational. Driver now may = expose > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > >> + * > > > > >> + * @param dev the device to probe > > > > >> + * > > > > >> + * @see DEVICE_ATTACH() > > > > >> + */ > > > > >> +METHOD void after_attach { > > > > >> + device_t dev; > > > > >> +} DEFAULT null_after_attach; > > > > >> + > > > > >> +/** > > > > >> * @brief Detach a driver from a device. > > > > >> * > > > > >> * This can be called if the user is replacing the > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > >> index d485b9f..6d849cb 100644 > > > > >> --- a/sys/kern/subr_bus.c > > > > >> +++ b/sys/kern/subr_bus.c > > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > >> dev->state =3D DS_ATTACHED; > > > > >> dev->flags &=3D ~DF_DONENOMATCH; > > > > >> devadded(dev); > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > >> return (0); > > > > >> } > > > > >>=20 > > > > >=20 > > > > > Does device_get_softc() work before attach is completed? (I don'= t have > > > > > time to go look in the code right now). If so, then a mutex init= ialized > > > > > and acquired early in the driver's attach routine, and also acqui= red in > > > > > the driver's cdev implementation routines before using any newbus > > > > > functions other than device_get_softc(), would solve the problem = without > > > > > a driver api change that would make it harder to backport/MFC dri= ver > > > > > changes. > > > >=20 > > > > Also, more generally, don't create the dev nodes before you are rea= dy to=20 > > > deal with requests. Why do we need to uglify everything here? If yo= u can't=20 > > > do that, you can check a bit in the softc and return EBUSY or ENXIO o= n open if=20 > > > that bit says that your driver isn't ready to accept requests. > > >=20 > > > I agree, this dosen't actually fix anything as the decision for what = to put > > > in your foo_attach() method rather than foo_after_attach() is non-obv= ious and=20 > > > very arbitrary. > > >=20 > > > The actual bug appears to only be with using 'device_busy()'. I think > > > this should be fixed by making device_busy() better, not by adding > > > this type of obfuscation to attach. It should be trivial to make > > > device_busy() safe to use on a device that is currently being attached > > > which will not require any changes to drivers. > >=20 > > Could you, please, elaborate your proposal ? How do you think device_bu= sy() > > can be enchanced ? > >=20 > > Obvious idea to sleep inside device_busy() until dev->state becomes !=3D > > DS_ATTACHED is no go, IMO. The issue is that this causes immediate dead= locks > > if device_attach() method needs to call destroy_dev() to rollback. >=20 > I think you could have a DS_ATTACHING state and allow device_busy() to wo= rk > for DS_ATTACHING. The idea being that it is a bug for a driver to invoke > device_busy() if it is going to fail attach. You may then need to do a f= ixup > in device_attach() to promote the state from DS_ATTACHED to DS_BUSY when = it > returns if there is a non-zero busy count. This is quite good idea, but it still adds burden to device author, although I agree that this is manageable. A scenario I have in mind now is the following: assume that driver needs to create two devfs nodes, lets name them dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) calls, and while creation of dri/card0 succeed, consequent creation of dri/forcewake0 could fail for numerous reasons. Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 would return ENXIO until dri/forcewake0 is created. This can be implemented with flag, indeed. But still somewhat muddy, and probably leads to user-visible errors (I mostly worry about graphical login managers). But for single-node drivers it is indeed a nice solution. >=20 > Something like this: >=20 > Index: kern/subr_bus.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- kern/subr_bus.c (revision 234057) > +++ kern/subr_bus.c (working copy) > @@ -2472,12 +2472,13 @@ > void > device_busy(device_t dev) > { > - if (dev->state < DS_ATTACHED) > + if (dev->state < DS_ATTACHING) > panic("device_busy: called for unattached device"); > if (dev->busy =3D=3D 0 && dev->parent) > device_busy(dev->parent); > dev->busy++; > - dev->state =3D DS_BUSY; > + if (dev->state =3D=3D DS_ATTACHED) > + dev->state =3D DS_BUSY; > } > =20 > /** > @@ -2486,14 +2487,16 @@ > void > device_unbusy(device_t dev) > { > - if (dev->state !=3D DS_BUSY) > + if (dev->busy !=3D 0 && dev->state !=3D DS_BUSY && > + dev->state !=3D DS_ATTACHING) > panic("device_unbusy: called for non-busy device %s", > device_get_nameunit(dev)); > dev->busy--; > if (dev->busy =3D=3D 0) { > if (dev->parent) > device_unbusy(dev->parent); > - dev->state =3D DS_ATTACHED; > + if (dev->state =3D=3D DS_BUSY) > + dev->state =3D DS_ATTACHED; > } > } > =20 > @@ -2729,6 +2732,7 @@ > device_sysctl_init(dev); > if (!device_is_quiet(dev)) > device_print_child(dev->parent, dev); > + dev->state =3D DS_ATTACHING; > if ((error =3D DEVICE_ATTACH(dev)) !=3D 0) { > printf("device_attach: %s%d attach returned %d\n", > dev->driver->name, dev->unit, error); > @@ -2736,11 +2740,15 @@ > devclass_delete_device(dev->devclass, dev); > (void)device_set_driver(dev, NULL); > device_sysctl_fini(dev); > + KASSERT(dev->busy =3D=3D 0, ("attach failed but busy")); > dev->state =3D DS_NOTPRESENT; > return (error); > } > device_sysctl_update(dev); > - dev->state =3D DS_ATTACHED; > + if (dev->busy) > + dev->state =3D DS_BUSY; > + else > + dev->state =3D DS_ATTACHED; > dev->flags &=3D ~DF_DONENOMATCH; > devadded(dev); > return (0); > Index: sys/bus.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/bus.h (revision 234057) > +++ sys/bus.h (working copy) > @@ -52,6 +52,7 @@ > typedef enum device_state { > DS_NOTPRESENT =3D 10, /**< @brief not probed or probe failed */ > DS_ALIVE =3D 20, /**< @brief probe succeeded */ > + DS_ATTACHING =3D 25, /**< @brief currently attaching */ > DS_ATTACHED =3D 30, /**< @brief attach method called */ > DS_BUSY =3D 40 /**< @brief device is open */ > } device_state_t; >=20 > --=20 > John Baldwin --3e+tHtSlmys++yUp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+DQQkACgkQC3+MBN1Mb4gq+wCgmVOauphI9rfoDe0vN+/yPiK1 XH4An2S95rMdXjh59SBwTNiu64r1fvQO =Sjwn -----END PGP SIGNATURE----- --3e+tHtSlmys++yUp-- From owner-freebsd-drivers@FreeBSD.ORG Tue Apr 10 13:58:29 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0CB5106567B; Tue, 10 Apr 2012 13:58:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id B2D348FC16; Tue, 10 Apr 2012 13:58:29 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F1125B9BB; Tue, 10 Apr 2012 09:58:28 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Date: Tue, 10 Apr 2012 09:56:06 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204091536.08399.jhb@freebsd.org> <20120409200529.GT2358@deviant.kiev.zoral.com.ua> In-Reply-To: <20120409200529.GT2358@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201204100956.06750.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 10 Apr 2012 09:58:29 -0400 (EDT) Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 13:58:30 -0000 On Monday, April 09, 2012 4:05:29 pm Konstantin Belousov wrote: > On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > > >> Hello, > > > > > >> there seems to be a problem with device attach sequence offered by > > > > newbus. > > > > > >> Basically, when device attach method is executing, device is not fully > > > > > >> initialized yet. Also the device state in the newbus part of the world > > > > > >> is DS_ALIVE. There is definitely no shattering news in the statements, > > > > > >> but drivers that e.g. create devfs node to communicate with consumers > > > > > >> are prone to a race. > > > > > >> > > > > > >> If /dev node is created inside device attach method, then usermode > > > > > >> can start calling cdevsw methods before device fully initialized itself. > > > > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > > > > >> like device_busy(9), then panic occurs "called for unatteched device". > > > > > >> I get reports from users about this issues, to it is not something > > > > > >> that only could happen. > > > > > >> > > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > > > >> from newbus right after device attach finished and newbus considers > > > > > >> the device fully initialized. Driver then could create devfs node > > > > > >> in the after_attach method instead of attach. Please see the patch below. > > > > > >> > > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > > >> index eb720eb..9db74e2 100644 > > > > > >> --- a/sys/kern/device_if.m > > > > > >> +++ b/sys/kern/device_if.m > > > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > > > >> # Default implementations of some methods. > > > > > >> # > > > > > >> CODE { > > > > > >> + static void null_after_attach(device_t dev) > > > > > >> + { > > > > > >> + } > > > > > >> + > > > > > >> static int null_shutdown(device_t dev) > > > > > >> { > > > > > >> return 0; > > > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > > > >> }; > > > > > >> > > > > > >> /** > > > > > >> + * @brief Notify the driver that device is in attached state > > > > > >> + * > > > > > >> + * Called after driver is successfully attached to the device and > > > > > >> + * corresponding device_t is fully operational. Driver now may expose > > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > > >> + * > > > > > >> + * @param dev the device to probe > > > > > >> + * > > > > > >> + * @see DEVICE_ATTACH() > > > > > >> + */ > > > > > >> +METHOD void after_attach { > > > > > >> + device_t dev; > > > > > >> +} DEFAULT null_after_attach; > > > > > >> + > > > > > >> +/** > > > > > >> * @brief Detach a driver from a device. > > > > > >> * > > > > > >> * This can be called if the user is replacing the > > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > > >> index d485b9f..6d849cb 100644 > > > > > >> --- a/sys/kern/subr_bus.c > > > > > >> +++ b/sys/kern/subr_bus.c > > > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > > >> dev->state = DS_ATTACHED; > > > > > >> dev->flags &= ~DF_DONENOMATCH; > > > > > >> devadded(dev); > > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > > >> return (0); > > > > > >> } > > > > > >> > > > > > > > > > > > > Does device_get_softc() work before attach is completed? (I don't have > > > > > > time to go look in the code right now). If so, then a mutex initialized > > > > > > and acquired early in the driver's attach routine, and also acquired in > > > > > > the driver's cdev implementation routines before using any newbus > > > > > > functions other than device_get_softc(), would solve the problem without > > > > > > a driver api change that would make it harder to backport/MFC driver > > > > > > changes. > > > > > > > > > > Also, more generally, don't create the dev nodes before you are ready to > > > > deal with requests. Why do we need to uglify everything here? If you can't > > > > do that, you can check a bit in the softc and return EBUSY or ENXIO on open if > > > > that bit says that your driver isn't ready to accept requests. > > > > > > > > I agree, this dosen't actually fix anything as the decision for what to put > > > > in your foo_attach() method rather than foo_after_attach() is non-obvious and > > > > very arbitrary. > > > > > > > > The actual bug appears to only be with using 'device_busy()'. I think > > > > this should be fixed by making device_busy() better, not by adding > > > > this type of obfuscation to attach. It should be trivial to make > > > > device_busy() safe to use on a device that is currently being attached > > > > which will not require any changes to drivers. > > > > > > Could you, please, elaborate your proposal ? How do you think device_busy() > > > can be enchanced ? > > > > > > Obvious idea to sleep inside device_busy() until dev->state becomes != > > > DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks > > > if device_attach() method needs to call destroy_dev() to rollback. > > > > I think you could have a DS_ATTACHING state and allow device_busy() to work > > for DS_ATTACHING. The idea being that it is a bug for a driver to invoke > > device_busy() if it is going to fail attach. You may then need to do a fixup > > in device_attach() to promote the state from DS_ATTACHED to DS_BUSY when it > > returns if there is a non-zero busy count. > This is quite good idea, but it still adds burden to device author, > although I agree that this is manageable. A scenario I have in mind now > is the following: > assume that driver needs to create two devfs nodes, lets name them > dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) > calls, and while creation of dri/card0 succeed, consequent creation > of dri/forcewake0 could fail for numerous reasons. > > Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 > would return ENXIO until dri/forcewake0 is created. This can be implemented > with flag, indeed. But still somewhat muddy, and probably leads to > user-visible errors (I mostly worry about graphical login managers). You could also sleep on the flag in d_open() (you can imagine a two-step process where you set a "adding cdev's flag", then all d_open() calls block on it). Then when finished adding cdev's, you set a flag if an error occurred and wake up all the waiters. If no error occured the waiters can have the first open() work fine. But even with other proposals you still have to deal with this problem if you want to fail out entirely if you have problems creating cdevs. > But for single-node drivers it is indeed a nice solution. I think we are somewhat stuck with this for other reasons as well. Note that device_busy() propagates up the tree to parent devices, so imagine kldloading a driver that creates a tree (e.g. a bus with a few consumers) where the leaf devices are attached by a call to bus_generic_attach() from the device_attach() method for the parent device. Even if you add a DEVICE_AFTER_ATTACH() hook, while it may allow device_busy() to be invoked on the leaf device, when it tries to propagate device_busy() up to the parent device it would still be in the middle of attach and blow up anyway. -- John Baldwin From owner-freebsd-drivers@FreeBSD.ORG Tue Apr 10 14:38:55 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 34132106564A; Tue, 10 Apr 2012 14:38:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 901BE8FC18; Tue, 10 Apr 2012 14:38:54 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q3AEcZkK056263; Tue, 10 Apr 2012 17:38:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q3AEcZ2x095469; Tue, 10 Apr 2012 17:38:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q3AEcZxs095468; Tue, 10 Apr 2012 17:38:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 10 Apr 2012 17:38:35 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120410143835.GZ2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204091536.08399.jhb@freebsd.org> <20120409200529.GT2358@deviant.kiev.zoral.com.ua> <201204100956.06750.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1KpuWNmRW3f9lK5a" Content-Disposition: inline In-Reply-To: <201204100956.06750.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 14:38:55 -0000 --1KpuWNmRW3f9lK5a Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Apr 10, 2012 at 09:56:06AM -0400, John Baldwin wrote: > On Monday, April 09, 2012 4:05:29 pm Konstantin Belousov wrote: > > On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > > > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > >=20 > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > >=20 > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > > > >> Hello, > > > > > > >> there seems to be a problem with device attach sequence offe= red by=20 > > > > > newbus. > > > > > > >> Basically, when device attach method is executing, device is= not fully > > > > > > >> initialized yet. Also the device state in the newbus part of= the world > > > > > > >> is DS_ALIVE. There is definitely no shattering news in the s= tatements, > > > > > > >> but drivers that e.g. create devfs node to communicate with = consumers > > > > > > >> are prone to a race. > > > > > > >>=20 > > > > > > >> If /dev node is created inside device attach method, then us= ermode > > > > > > >> can start calling cdevsw methods before device fully initial= ized itself. > > > > > > >> Even more, if device tries to use newbus helpers in cdevsw m= ethods, > > > > > > >> like device_busy(9), then panic occurs "called for unatteche= d device". > > > > > > >> I get reports from users about this issues, to it is not som= ething > > > > > > >> that only could happen. > > > > > > >>=20 > > > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be = called > > > > > > >> from newbus right after device attach finished and newbus co= nsiders > > > > > > >> the device fully initialized. Driver then could create devfs= node > > > > > > >> in the after_attach method instead of attach. Please see the= patch below. > > > > > > >>=20 > > > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > > > >> index eb720eb..9db74e2 100644 > > > > > > >> --- a/sys/kern/device_if.m > > > > > > >> +++ b/sys/kern/device_if.m > > > > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > > > > >> # Default implementations of some methods. > > > > > > >> # > > > > > > >> CODE { > > > > > > >> + static void null_after_attach(device_t dev) > > > > > > >> + { > > > > > > >> + } > > > > > > >> + > > > > > > >> static int null_shutdown(device_t dev) > > > > > > >> { > > > > > > >> return 0; > > > > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > > > > >> }; > > > > > > >>=20 > > > > > > >> /** > > > > > > >> + * @brief Notify the driver that device is in attached state > > > > > > >> + * > > > > > > >> + * Called after driver is successfully attached to the devi= ce and > > > > > > >> + * corresponding device_t is fully operational. Driver now = may expose > > > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > > > >> + * > > > > > > >> + * @param dev the device to probe > > > > > > >> + * > > > > > > >> + * @see DEVICE_ATTACH() > > > > > > >> + */ > > > > > > >> +METHOD void after_attach { > > > > > > >> + device_t dev; > > > > > > >> +} DEFAULT null_after_attach; > > > > > > >> + > > > > > > >> +/** > > > > > > >> * @brief Detach a driver from a device. > > > > > > >> * > > > > > > >> * This can be called if the user is replacing the > > > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > > > >> index d485b9f..6d849cb 100644 > > > > > > >> --- a/sys/kern/subr_bus.c > > > > > > >> +++ b/sys/kern/subr_bus.c > > > > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > > > >> dev->state =3D DS_ATTACHED; > > > > > > >> dev->flags &=3D ~DF_DONENOMATCH; > > > > > > >> devadded(dev); > > > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > > > >> return (0); > > > > > > >> } > > > > > > >>=20 > > > > > > >=20 > > > > > > > Does device_get_softc() work before attach is completed? (I = don't have > > > > > > > time to go look in the code right now). If so, then a mutex = initialized > > > > > > > and acquired early in the driver's attach routine, and also a= cquired in > > > > > > > the driver's cdev implementation routines before using any ne= wbus > > > > > > > functions other than device_get_softc(), would solve the prob= lem without > > > > > > > a driver api change that would make it harder to backport/MFC= driver > > > > > > > changes. > > > > > >=20 > > > > > > Also, more generally, don't create the dev nodes before you are= ready to=20 > > > > > deal with requests. Why do we need to uglify everything here? I= f you can't=20 > > > > > do that, you can check a bit in the softc and return EBUSY or ENX= IO on open if=20 > > > > > that bit says that your driver isn't ready to accept requests. > > > > >=20 > > > > > I agree, this dosen't actually fix anything as the decision for w= hat to put > > > > > in your foo_attach() method rather than foo_after_attach() is non= -obvious and=20 > > > > > very arbitrary. > > > > >=20 > > > > > The actual bug appears to only be with using 'device_busy()'. I t= hink > > > > > this should be fixed by making device_busy() better, not by adding > > > > > this type of obfuscation to attach. It should be trivial to make > > > > > device_busy() safe to use on a device that is currently being att= ached > > > > > which will not require any changes to drivers. > > > >=20 > > > > Could you, please, elaborate your proposal ? How do you think devic= e_busy() > > > > can be enchanced ? > > > >=20 > > > > Obvious idea to sleep inside device_busy() until dev->state becomes= !=3D > > > > DS_ATTACHED is no go, IMO. The issue is that this causes immediate = deadlocks > > > > if device_attach() method needs to call destroy_dev() to rollback. > > >=20 > > > I think you could have a DS_ATTACHING state and allow device_busy() t= o work > > > for DS_ATTACHING. The idea being that it is a bug for a driver to in= voke > > > device_busy() if it is going to fail attach. You may then need to do= a fixup > > > in device_attach() to promote the state from DS_ATTACHED to DS_BUSY w= hen it > > > returns if there is a non-zero busy count. > > This is quite good idea, but it still adds burden to device author, > > although I agree that this is manageable. A scenario I have in mind now > > is the following: > > assume that driver needs to create two devfs nodes, lets name them > > dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) > > calls, and while creation of dri/card0 succeed, consequent creation > > of dri/forcewake0 could fail for numerous reasons. > >=20 > > Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 > > would return ENXIO until dri/forcewake0 is created. This can be impleme= nted > > with flag, indeed. But still somewhat muddy, and probably leads to > > user-visible errors (I mostly worry about graphical login managers). >=20 > You could also sleep on the flag in d_open() (you can imagine a two-step > process where you set a "adding cdev's flag", then all d_open() calls blo= ck > on it). Then when finished adding cdev's, you set a flag if an error > occurred and wake up all the waiters. If no error occured the waiters can > have the first open() work fine. But even with other proposals you still > have to deal with this problem if you want to fail out entirely if you > have problems creating cdevs. >=20 > > But for single-node drivers it is indeed a nice solution. >=20 > I think we are somewhat stuck with this for other reasons as well. Note > that device_busy() propagates up the tree to parent devices, so imagine > kldloading a driver that creates a tree (e.g. a bus with a few consumers) > where the leaf devices are attached by a call to bus_generic_attach() from > the device_attach() method for the parent device. Even if you add a > DEVICE_AFTER_ATTACH() hook, while it may allow device_busy() to be invoked > on the leaf device, when it tries to propagate device_busy() up to the > parent device it would still be in the middle of attach and blow up anywa= y. This looks like a bug in my implementation of after_attach, which apparently should be called for new tree after the top level attach finished. Anyway, would you commit your change ? I definitely can work out the driver change after. But this seems to be a large amount of work for driver authors. --1KpuWNmRW3f9lK5a Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+EReoACgkQC3+MBN1Mb4g2oACg5H8O707oiuR1k1MaptmwgEJQ gM4An2WDMk7DX3F8qFaA2HzvGTPLJeXs =R09x -----END PGP SIGNATURE----- --1KpuWNmRW3f9lK5a-- From owner-freebsd-drivers@FreeBSD.ORG Tue Apr 10 18:56:30 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F7931065678; Tue, 10 Apr 2012 18:56:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 616088FC12; Tue, 10 Apr 2012 18:56:30 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D22D1B95A; Tue, 10 Apr 2012 14:56:29 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Date: Tue, 10 Apr 2012 14:47:37 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204100956.06750.jhb@freebsd.org> <20120410143835.GZ2358@deviant.kiev.zoral.com.ua> In-Reply-To: <20120410143835.GZ2358@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201204101447.37988.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 10 Apr 2012 14:56:29 -0400 (EDT) Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 18:56:30 -0000 On Tuesday, April 10, 2012 10:38:35 am Konstantin Belousov wrote: > On Tue, Apr 10, 2012 at 09:56:06AM -0400, John Baldwin wrote: > > On Monday, April 09, 2012 4:05:29 pm Konstantin Belousov wrote: > > > On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > > > > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > > > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > > > > > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > > > > > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote: > > > > > > > >> Hello, > > > > > > > >> there seems to be a problem with device attach sequence offered by > > > > > > newbus. > > > > > > > >> Basically, when device attach method is executing, device is not fully > > > > > > > >> initialized yet. Also the device state in the newbus part of the world > > > > > > > >> is DS_ALIVE. There is definitely no shattering news in the statements, > > > > > > > >> but drivers that e.g. create devfs node to communicate with consumers > > > > > > > >> are prone to a race. > > > > > > > >> > > > > > > > >> If /dev node is created inside device attach method, then usermode > > > > > > > >> can start calling cdevsw methods before device fully initialized itself. > > > > > > > >> Even more, if device tries to use newbus helpers in cdevsw methods, > > > > > > > >> like device_busy(9), then panic occurs "called for unatteched device". > > > > > > > >> I get reports from users about this issues, to it is not something > > > > > > > >> that only could happen. > > > > > > > >> > > > > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called > > > > > > > >> from newbus right after device attach finished and newbus considers > > > > > > > >> the device fully initialized. Driver then could create devfs node > > > > > > > >> in the after_attach method instead of attach. Please see the patch below. > > > > > > > >> > > > > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > > > > >> index eb720eb..9db74e2 100644 > > > > > > > >> --- a/sys/kern/device_if.m > > > > > > > >> +++ b/sys/kern/device_if.m > > > > > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > > > > > >> # Default implementations of some methods. > > > > > > > >> # > > > > > > > >> CODE { > > > > > > > >> + static void null_after_attach(device_t dev) > > > > > > > >> + { > > > > > > > >> + } > > > > > > > >> + > > > > > > > >> static int null_shutdown(device_t dev) > > > > > > > >> { > > > > > > > >> return 0; > > > > > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > > > > > >> }; > > > > > > > >> > > > > > > > >> /** > > > > > > > >> + * @brief Notify the driver that device is in attached state > > > > > > > >> + * > > > > > > > >> + * Called after driver is successfully attached to the device and > > > > > > > >> + * corresponding device_t is fully operational. Driver now may expose > > > > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > > > > >> + * > > > > > > > >> + * @param dev the device to probe > > > > > > > >> + * > > > > > > > >> + * @see DEVICE_ATTACH() > > > > > > > >> + */ > > > > > > > >> +METHOD void after_attach { > > > > > > > >> + device_t dev; > > > > > > > >> +} DEFAULT null_after_attach; > > > > > > > >> + > > > > > > > >> +/** > > > > > > > >> * @brief Detach a driver from a device. > > > > > > > >> * > > > > > > > >> * This can be called if the user is replacing the > > > > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > > > > >> index d485b9f..6d849cb 100644 > > > > > > > >> --- a/sys/kern/subr_bus.c > > > > > > > >> +++ b/sys/kern/subr_bus.c > > > > > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > > > > >> dev->state = DS_ATTACHED; > > > > > > > >> dev->flags &= ~DF_DONENOMATCH; > > > > > > > >> devadded(dev); > > > > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > > > > >> return (0); > > > > > > > >> } > > > > > > > >> > > > > > > > > > > > > > > > > Does device_get_softc() work before attach is completed? (I don't have > > > > > > > > time to go look in the code right now). If so, then a mutex initialized > > > > > > > > and acquired early in the driver's attach routine, and also acquired in > > > > > > > > the driver's cdev implementation routines before using any newbus > > > > > > > > functions other than device_get_softc(), would solve the problem without > > > > > > > > a driver api change that would make it harder to backport/MFC driver > > > > > > > > changes. > > > > > > > > > > > > > > Also, more generally, don't create the dev nodes before you are ready to > > > > > > deal with requests. Why do we need to uglify everything here? If you can't > > > > > > do that, you can check a bit in the softc and return EBUSY or ENXIO on open if > > > > > > that bit says that your driver isn't ready to accept requests. > > > > > > > > > > > > I agree, this dosen't actually fix anything as the decision for what to put > > > > > > in your foo_attach() method rather than foo_after_attach() is non-obvious and > > > > > > very arbitrary. > > > > > > > > > > > > The actual bug appears to only be with using 'device_busy()'. I think > > > > > > this should be fixed by making device_busy() better, not by adding > > > > > > this type of obfuscation to attach. It should be trivial to make > > > > > > device_busy() safe to use on a device that is currently being attached > > > > > > which will not require any changes to drivers. > > > > > > > > > > Could you, please, elaborate your proposal ? How do you think device_busy() > > > > > can be enchanced ? > > > > > > > > > > Obvious idea to sleep inside device_busy() until dev->state becomes != > > > > > DS_ATTACHED is no go, IMO. The issue is that this causes immediate deadlocks > > > > > if device_attach() method needs to call destroy_dev() to rollback. > > > > > > > > I think you could have a DS_ATTACHING state and allow device_busy() to work > > > > for DS_ATTACHING. The idea being that it is a bug for a driver to invoke > > > > device_busy() if it is going to fail attach. You may then need to do a fixup > > > > in device_attach() to promote the state from DS_ATTACHED to DS_BUSY when it > > > > returns if there is a non-zero busy count. > > > This is quite good idea, but it still adds burden to device author, > > > although I agree that this is manageable. A scenario I have in mind now > > > is the following: > > > assume that driver needs to create two devfs nodes, lets name them > > > dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) > > > calls, and while creation of dri/card0 succeed, consequent creation > > > of dri/forcewake0 could fail for numerous reasons. > > > > > > Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 > > > would return ENXIO until dri/forcewake0 is created. This can be implemented > > > with flag, indeed. But still somewhat muddy, and probably leads to > > > user-visible errors (I mostly worry about graphical login managers). > > > > You could also sleep on the flag in d_open() (you can imagine a two-step > > process where you set a "adding cdev's flag", then all d_open() calls block > > on it). Then when finished adding cdev's, you set a flag if an error > > occurred and wake up all the waiters. If no error occured the waiters can > > have the first open() work fine. But even with other proposals you still > > have to deal with this problem if you want to fail out entirely if you > > have problems creating cdevs. > > > > > But for single-node drivers it is indeed a nice solution. > > > > I think we are somewhat stuck with this for other reasons as well. Note > > that device_busy() propagates up the tree to parent devices, so imagine > > kldloading a driver that creates a tree (e.g. a bus with a few consumers) > > where the leaf devices are attached by a call to bus_generic_attach() from > > the device_attach() method for the parent device. Even if you add a > > DEVICE_AFTER_ATTACH() hook, while it may allow device_busy() to be invoked > > on the leaf device, when it tries to propagate device_busy() up to the > > parent device it would still be in the middle of attach and blow up anyway. > > This looks like a bug in my implementation of after_attach, which > apparently should be called for new tree after the top level attach > finished. Hmm, I think this is a bit less trivial as it involves an entire pass over the tree (and you'd probably not want to invoke it more than once on a device, so you'd have to track that, etc.). > Anyway, would you commit your change ? I definitely can work out the > driver change after. But this seems to be a large amount of work for > driver authors. Oh, sure. Have you tried it out? I have not tested it yet, so I will need to do that first unless you can do so easily. -- John Baldwin From owner-freebsd-drivers@FreeBSD.ORG Wed Apr 11 06:32:01 2012 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7AEBE106564A; Wed, 11 Apr 2012 06:32:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id D45088FC08; Wed, 11 Apr 2012 06:32:00 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q3B6VbgI092902; Wed, 11 Apr 2012 09:31:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q3B6VbiY000851; Wed, 11 Apr 2012 09:31:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q3B6VbxD000850; Wed, 11 Apr 2012 09:31:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 11 Apr 2012 09:31:37 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120411063136.GI2358@deviant.kiev.zoral.com.ua> References: <20120407125056.GS2358@deviant.kiev.zoral.com.ua> <201204100956.06750.jhb@freebsd.org> <20120410143835.GZ2358@deviant.kiev.zoral.com.ua> <201204101447.37988.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4NXZt1nqaY8isg8F" Content-Disposition: inline In-Reply-To: <201204101447.37988.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Ian Lepore , freebsd-drivers@freebsd.org, current@freebsd.org Subject: Re: device_attach(9) and driver initialization X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 06:32:01 -0000 --4NXZt1nqaY8isg8F Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Apr 10, 2012 at 02:47:37PM -0400, John Baldwin wrote: > On Tuesday, April 10, 2012 10:38:35 am Konstantin Belousov wrote: > > On Tue, Apr 10, 2012 at 09:56:06AM -0400, John Baldwin wrote: > > > On Monday, April 09, 2012 4:05:29 pm Konstantin Belousov wrote: > > > > On Mon, Apr 09, 2012 at 03:36:08PM -0400, John Baldwin wrote: > > > > > On Monday, April 09, 2012 3:10:00 pm Konstantin Belousov wrote: > > > > > > On Mon, Apr 09, 2012 at 11:01:03AM -0400, John Baldwin wrote: > > > > > > > On Saturday, April 07, 2012 11:08:41 pm Warner Losh wrote: > > > > > > > >=20 > > > > > > > > On Apr 7, 2012, at 8:46 AM, Ian Lepore wrote: > > > > > > > >=20 > > > > > > > > > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wr= ote: > > > > > > > > >> Hello, > > > > > > > > >> there seems to be a problem with device attach sequence = offered by=20 > > > > > > > newbus. > > > > > > > > >> Basically, when device attach method is executing, devic= e is not fully > > > > > > > > >> initialized yet. Also the device state in the newbus par= t of the world > > > > > > > > >> is DS_ALIVE. There is definitely no shattering news in t= he statements, > > > > > > > > >> but drivers that e.g. create devfs node to communicate w= ith consumers > > > > > > > > >> are prone to a race. > > > > > > > > >>=20 > > > > > > > > >> If /dev node is created inside device attach method, the= n usermode > > > > > > > > >> can start calling cdevsw methods before device fully ini= tialized itself. > > > > > > > > >> Even more, if device tries to use newbus helpers in cdev= sw methods, > > > > > > > > >> like device_busy(9), then panic occurs "called for unatt= eched device". > > > > > > > > >> I get reports from users about this issues, to it is not= something > > > > > > > > >> that only could happen. > > > > > > > > >>=20 > > > > > > > > >> I propose to add DEVICE_AFTER_ATTACH() driver method, to= be called > > > > > > > > >> from newbus right after device attach finished and newbu= s considers > > > > > > > > >> the device fully initialized. Driver then could create d= evfs node > > > > > > > > >> in the after_attach method instead of attach. Please see= the patch below. > > > > > > > > >>=20 > > > > > > > > >> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m > > > > > > > > >> index eb720eb..9db74e2 100644 > > > > > > > > >> --- a/sys/kern/device_if.m > > > > > > > > >> +++ b/sys/kern/device_if.m > > > > > > > > >> @@ -43,6 +43,10 @@ INTERFACE device; > > > > > > > > >> # Default implementations of some methods. > > > > > > > > >> # > > > > > > > > >> CODE { > > > > > > > > >> + static void null_after_attach(device_t dev) > > > > > > > > >> + { > > > > > > > > >> + } > > > > > > > > >> + > > > > > > > > >> static int null_shutdown(device_t dev) > > > > > > > > >> { > > > > > > > > >> return 0; > > > > > > > > >> @@ -199,6 +203,21 @@ METHOD int attach { > > > > > > > > >> }; > > > > > > > > >>=20 > > > > > > > > >> /** > > > > > > > > >> + * @brief Notify the driver that device is in attached = state > > > > > > > > >> + * > > > > > > > > >> + * Called after driver is successfully attached to the = device and > > > > > > > > >> + * corresponding device_t is fully operational. Driver = now may expose > > > > > > > > >> + * the device to the consumers, e.g. create devfs nodes. > > > > > > > > >> + * > > > > > > > > >> + * @param dev the device to probe > > > > > > > > >> + * > > > > > > > > >> + * @see DEVICE_ATTACH() > > > > > > > > >> + */ > > > > > > > > >> +METHOD void after_attach { > > > > > > > > >> + device_t dev; > > > > > > > > >> +} DEFAULT null_after_attach; > > > > > > > > >> + > > > > > > > > >> +/** > > > > > > > > >> * @brief Detach a driver from a device. > > > > > > > > >> * > > > > > > > > >> * This can be called if the user is replacing the > > > > > > > > >> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c > > > > > > > > >> index d485b9f..6d849cb 100644 > > > > > > > > >> --- a/sys/kern/subr_bus.c > > > > > > > > >> +++ b/sys/kern/subr_bus.c > > > > > > > > >> @@ -2743,6 +2743,7 @@ device_attach(device_t dev) > > > > > > > > >> dev->state =3D DS_ATTACHED; > > > > > > > > >> dev->flags &=3D ~DF_DONENOMATCH; > > > > > > > > >> devadded(dev); > > > > > > > > >> + DEVICE_AFTER_ATTACH(dev); > > > > > > > > >> return (0); > > > > > > > > >> } > > > > > > > > >>=20 > > > > > > > > >=20 > > > > > > > > > Does device_get_softc() work before attach is completed? = (I don't have > > > > > > > > > time to go look in the code right now). If so, then a mu= tex initialized > > > > > > > > > and acquired early in the driver's attach routine, and al= so acquired in > > > > > > > > > the driver's cdev implementation routines before using an= y newbus > > > > > > > > > functions other than device_get_softc(), would solve the = problem without > > > > > > > > > a driver api change that would make it harder to backport= /MFC driver > > > > > > > > > changes. > > > > > > > >=20 > > > > > > > > Also, more generally, don't create the dev nodes before you= are ready to=20 > > > > > > > deal with requests. Why do we need to uglify everything here= ? If you can't=20 > > > > > > > do that, you can check a bit in the softc and return EBUSY or= ENXIO on open if=20 > > > > > > > that bit says that your driver isn't ready to accept requests. > > > > > > >=20 > > > > > > > I agree, this dosen't actually fix anything as the decision f= or what to put > > > > > > > in your foo_attach() method rather than foo_after_attach() is= non-obvious and=20 > > > > > > > very arbitrary. > > > > > > >=20 > > > > > > > The actual bug appears to only be with using 'device_busy()'.= I think > > > > > > > this should be fixed by making device_busy() better, not by a= dding > > > > > > > this type of obfuscation to attach. It should be trivial to m= ake > > > > > > > device_busy() safe to use on a device that is currently being= attached > > > > > > > which will not require any changes to drivers. > > > > > >=20 > > > > > > Could you, please, elaborate your proposal ? How do you think d= evice_busy() > > > > > > can be enchanced ? > > > > > >=20 > > > > > > Obvious idea to sleep inside device_busy() until dev->state bec= omes !=3D > > > > > > DS_ATTACHED is no go, IMO. The issue is that this causes immedi= ate deadlocks > > > > > > if device_attach() method needs to call destroy_dev() to rollba= ck. > > > > >=20 > > > > > I think you could have a DS_ATTACHING state and allow device_busy= () to work > > > > > for DS_ATTACHING. The idea being that it is a bug for a driver t= o invoke > > > > > device_busy() if it is going to fail attach. You may then need t= o do a fixup > > > > > in device_attach() to promote the state from DS_ATTACHED to DS_BU= SY when it > > > > > returns if there is a non-zero busy count. > > > > This is quite good idea, but it still adds burden to device author, > > > > although I agree that this is manageable. A scenario I have in mind= now > > > > is the following: > > > > assume that driver needs to create two devfs nodes, lets name them > > > > dri/card0 and dri/forcewake0. Driver would perform two make_dev_p(9) > > > > calls, and while creation of dri/card0 succeed, consequent creation > > > > of dri/forcewake0 could fail for numerous reasons. > > > >=20 > > > > Now, the driver needs to ensure that cdesvw->d_open() on dri/card0 > > > > would return ENXIO until dri/forcewake0 is created. This can be imp= lemented > > > > with flag, indeed. But still somewhat muddy, and probably leads to > > > > user-visible errors (I mostly worry about graphical login managers). > > >=20 > > > You could also sleep on the flag in d_open() (you can imagine a two-s= tep > > > process where you set a "adding cdev's flag", then all d_open() calls= block > > > on it). Then when finished adding cdev's, you set a flag if an error > > > occurred and wake up all the waiters. If no error occured the waiter= s can > > > have the first open() work fine. But even with other proposals you s= till > > > have to deal with this problem if you want to fail out entirely if you > > > have problems creating cdevs. > > >=20 > > > > But for single-node drivers it is indeed a nice solution. > > >=20 > > > I think we are somewhat stuck with this for other reasons as well. N= ote > > > that device_busy() propagates up the tree to parent devices, so imagi= ne > > > kldloading a driver that creates a tree (e.g. a bus with a few consum= ers) > > > where the leaf devices are attached by a call to bus_generic_attach()= from > > > the device_attach() method for the parent device. Even if you add a > > > DEVICE_AFTER_ATTACH() hook, while it may allow device_busy() to be in= voked > > > on the leaf device, when it tries to propagate device_busy() up to the > > > parent device it would still be in the middle of attach and blow up a= nyway. > >=20 > > This looks like a bug in my implementation of after_attach, which > > apparently should be called for new tree after the top level attach > > finished. >=20 > Hmm, I think this is a bit less trivial as it involves an entire pass over > the tree (and you'd probably not want to invoke it more than once on a > device, so you'd have to track that, etc.). =20 Yes, I understand that the fix for the problem with after_attach() somewhat involved. I either could mark each device in the tree as being notified, and traverse the tree after top-level attach, or keep a deferred list of devices to notify. I also would need to distinguish top-level device_attach= () against nested attaches. >=20 > > Anyway, would you commit your change ? I definitely can work out the > > driver change after. But this seems to be a large amount of work for > > driver authors. >=20 > Oh, sure. Have you tried it out? I have not tested it yet, so I will ne= ed to > do that first unless you can do so easily. Yes, I did in a sense that drm uses device_busy(). I pushed 14.3 Intel patch to web, and specifically mailed a reporter about the possible fix for his problem. Lets see how it goes. I am still worry about ENXIO for login managers, but this would be next item. --4NXZt1nqaY8isg8F Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+FJUgACgkQC3+MBN1Mb4g0kACfQA4/E0TlZ1ltssJIAG4J0r2K 1pkAmgMxMyPNGrUhZFCY+LXGJ17ZMRa3 =81ac -----END PGP SIGNATURE----- --4NXZt1nqaY8isg8F--