Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jan 2013 11:54:14 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Kohji Okuno <okuno.kohji@jp.panasonic.com>
Cc:        ae@freebsd.org, freebsd-current@FreeBSD.org
Subject:   Re: deadlock between g_event and a thread on removing a device.
Message-ID:  <20130123095414.GU2522@kib.kiev.ua>
In-Reply-To: <20130118.144538.1860198911942517633.okuno.kohji@jp.panasonic.com>
References:  <20130118.144538.1860198911942517633.okuno.kohji@jp.panasonic.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--8MIuRhPDBHHz1F6y
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jan 18, 2013 at 02:45:38PM +0900, Kohji Okuno wrote:
> Hi,
>=20
> When I removed a device (ex. /dev/da0), I have encounterd a
> dead-lock between ``g_event'' thread and a thread that is opening
> device file (I call this thread as A).
>=20
> Would you refer the following?
>=20
> When the device is removed between dev_refthread() and g_dev_open(),
> thread A incremented dev->si_threadcount, but can't acquire
> topology_lock.
>=20
> On the other hand, g_event is waiting to set dev->si_threadcount to 0
> with topology_lock.
>=20
> Regards,
>  Kohji Okuno
>=20
>=20
> <<< Thread A >>>
> ...
> devfs_open()
> {
>   ...
>   dsw =3D dev_refthread(dev, &ref); <=3D increment dev->si_threadcount
>   ...
>   error =3D dsw->d_open(...);       <=3D call g_dev_open()
>   ...
>   dev_relthread(dev, ref);        <=3D decrement dev->si_threadcount
> }
>=20
> g_dev_open()
> {
>   ...
>   g_topology_lock();              <=3D Thread A couldn't acquire=20
>   ...                                topology_lock.
> }
>=20
> <<< g_event >>>
> g_run_events()
> {
>    ...
>    g_topology_lock();             <=3D g_event acuired topology_lock here.
>    ...
>    one_event()
>    ...
> }
>=20
> one_event()
> g_orphan_register()
> g_dev_orphan()
> destroy_dev()
> destroy_dev()
> destroy_devl()
> {
>   ...
>   while (dev->si_threadcount !=3D 0) { <=3D this count was incremented by=
 Thread A
>     /* Use unique dummy wait ident */
>     msleep(&csw, &devmtx, PRIBIO, "devdrn", hz / 10);
>   }
>   ...
> }

Yes, you are absolutely right.

I believe there were some patches floating around which changed the
destroy_dev() call in the g_dev_orphan() to destroy_dev_sched(). I do
not remember who was the author.

My reply was that naive substitution of the destroy_dev() to
destroy_dev_sched() is racy, because some requests might still come
in after the call to destroy_dev_sched(). Despite destroy_dev_sched()
setting the CDP_SCHED_DTR flag on the devfs node, some thread might
already entered the cdevsw method. I do not believe that there was
further progress there.

--8MIuRhPDBHHz1F6y
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQ/7NFAAoJEJDCuSvBvK1BudYP/j4awPTFvuXcmeShTVrIRgix
guS7LxPnWNAFJztRiKSu1SiYGFY0lwyrAru9+L/pPaUwPsiHFGTwCHeqcXQIRv8l
mD3+8vx1M6ynXeVf39ACVB1Fxro5W9gyrj7f/z04sgjDY0vg4q0hswBEyZ0N1ROU
TF69g1De2GLfgOhm5SnXzmhPRbrcA4vJV5z35nmE1FV2uhyvGNmVrQlHSKepSq5C
xI7Pr/hpStJu1sFTt6aNlJAMjqQnvdzXDkYAq5QfQRzVeXionyVDv3rDgL3PoMJg
pGvsc4CsMqozBhNF0Bk2kHVSyXpXA/J+/oJutGarIEPyHJDj8xcJG6UG+2N7KyTU
O8KhRDj2k+LfwHzk3Rbl7SuWGyKGl47zmBrjrEFK7zavVHpxjN7ox5M/68WAcTCE
0c26MuMWTZ7PTRe/zibQTW2I0v1EjmlWk4S5eM1dZSn1STtcm6dJ2fF0dYClm1rY
zt4yeASiAPSl3gVuNL8tGA6KYbkihTei0Fdhvbd7aSmpYoZ6yfoTiU6BOIIyueiE
40T5Bs0NEEPp2BtLEXRelW8fIvgCVegwrPLwIesVHA7D9J7omS+BPTyJZanFxoHc
AobMYzxk/8B22CkPtswCtJb51QcRts29xPPolJ56lFG0ObUMl11M0lIvsP2rmzd9
ZaK+3a7WiezyXjSqbReO
=xJf+
-----END PGP SIGNATURE-----

--8MIuRhPDBHHz1F6y--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130123095414.GU2522>