Date: Sat, 30 Jan 2010 15:51:26 +0200 From: Kostik Belousov <kostikbel@gmail.com> To: Pawel Jakub Dawidek <pjd@freebsd.org> Cc: freebsd-hackers@freebsd.org, Alexander Motin <mav@freebsd.org>, FreeBSD-Current <freebsd-current@freebsd.org>, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. Message-ID: <20100130135126.GV3877@deviant.kiev.zoral.com.ua> In-Reply-To: <20100130114451.GB1660@garage.freebsd.pl> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
--2HdWiV8iqzNK3pYB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: > > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > > > Hi. > > >=20 > > > Experimenting with SATA hot-plug I've found quite repeatable deadlock > > > case. Problem observed when several SATA devices, opened via devfs, > > > disappear at exactly same time. In my case, at time of unplugging SATA > > > Port Multiplier with several disks beyond it. All I have to do is to = run > > > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unp= lug > > > multiplier. That causes predictable I/O errors and devices destructio= n. > > > But with high probability several dd processes getting stuck in kerne= l. > > [...] > >=20 > > I observed the same thing yesterday while stress-testing HAST: > >=20 > > 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd > > 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd > > 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] > >=20 > > Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit pat= h, > > which is already held by the g_event thread. >=20 > Maybe I'll add how I understand what's going on: >=20 > GEOM calls destroy_dev() while holding the topology lock. >=20 > Destroy_dev() wants to destroy device, but can't because there are > threads that still have it open. >=20 > The threads can't close it, because to close it they need the topology > lock. >=20 > The deadlock is quite obvious, IMHO. >=20 > I believe the problem could be solved by dropping the topology lock in > g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if > it is safe to drop the topology lock there. Maybe Poul-Henning could > take a look. As I already said, if you cannot drop a lock, destroy_dev_sched() is designed to handle this. You should be careful to not allow any further activitity on the device scheduled for destruction. --2HdWiV8iqzNK3pYB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktkOV0ACgkQC3+MBN1Mb4geLQCg3v+nX9pTfbMUUpasQBDnMwnd B7EAoN5oA9K9nFfI62P4vwKRzIUyAMO7 =15Wt -----END PGP SIGNATURE----- --2HdWiV8iqzNK3pYB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100130135126.GV3877>