Date: Sat, 30 Jan 2010 12:44:51 +0100 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, FreeBSD-Current <freebsd-current@freebsd.org>, kib@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. Message-ID: <20100130114451.GB1660@garage.freebsd.pl> In-Reply-To: <20100130112749.GA1660@garage.freebsd.pl> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
--EuxKj2iCbKjpUGkD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > > Hi. > >=20 > > Experimenting with SATA hot-plug I've found quite repeatable deadlock > > case. Problem observed when several SATA devices, opened via devfs, > > disappear at exactly same time. In my case, at time of unplugging SATA > > Port Multiplier with several disks beyond it. All I have to do is to run > > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug > > multiplier. That causes predictable I/O errors and devices destruction. > > But with high probability several dd processes getting stuck in kernel. > [...] >=20 > I observed the same thing yesterday while stress-testing HAST: >=20 > 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd > 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd > 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] >=20 > Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, > which is already held by the g_event thread. Maybe I'll add how I understand what's going on: GEOM calls destroy_dev() while holding the topology lock. Destroy_dev() wants to destroy device, but can't because there are threads that still have it open. The threads can't close it, because to close it they need the topology lock. The deadlock is quite obvious, IMHO. I believe the problem could be solved by dropping the topology lock in g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if it is safe to drop the topology lock there. Maybe Poul-Henning could take a look. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --EuxKj2iCbKjpUGkD Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFLZBuyForvXbEpPzQRAoaLAJ9X1IIhEfBcTNHc2CYBkh4RAzc/twCgj6x0 y1PsqIMgcFnE/ILC2kevD28= =hEg0 -----END PGP SIGNATURE----- --EuxKj2iCbKjpUGkD--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100130114451.GB1660>