Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Jan 2010 12:44:51 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, FreeBSD-Current <freebsd-current@freebsd.org>, kib@FreeBSD.org, freebsd-geom@freebsd.org
Subject:   Re: Deadlock between GEOM and devfs device destroy and process exit.
Message-ID:  <20100130114451.GB1660@garage.freebsd.pl>
In-Reply-To: <20100130112749.GA1660@garage.freebsd.pl>
References:  <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

--EuxKj2iCbKjpUGkD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote:
> On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote:
> > Hi.
> >=20
> > Experimenting with SATA hot-plug I've found quite repeatable deadlock
> > case. Problem observed when several SATA devices, opened via devfs,
> > disappear at exactly same time. In my case, at time of unplugging SATA
> > Port Multiplier with several disks beyond it. All I have to do is to run
> > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug
> > multiplier. That causes predictable I/O errors and devices destruction.
> > But with high probability several dd processes getting stuck in kernel.
> [...]
>=20
> I observed the same thing yesterday while stress-testing HAST:
>=20
>  3659  2504  3659     0  DE+     GEOM top 0x8079a348 dd
>  3658  2102  2102     0  DE+     GEOM top 0x8079a348 hastd
>     2     0     0     0  DL      devdrn   0x85b1bc68 [g_event]
>=20
> Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path,
> which is already held by the g_event thread.

Maybe I'll add how I understand what's going on:

GEOM calls destroy_dev() while holding the topology lock.

Destroy_dev() wants to destroy device, but can't because there are
threads that still have it open.

The threads can't close it, because to close it they need the topology
lock.

The deadlock is quite obvious, IMHO.

I believe the problem could be solved by dropping the topology lock in
g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if
it is safe to drop the topology lock there. Maybe Poul-Henning could
take a look.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--EuxKj2iCbKjpUGkD
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFLZBuyForvXbEpPzQRAoaLAJ9X1IIhEfBcTNHc2CYBkh4RAzc/twCgj6x0
y1PsqIMgcFnE/ILC2kevD28=
=hEg0
-----END PGP SIGNATURE-----

--EuxKj2iCbKjpUGkD--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100130114451.GB1660>