From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 29 23:11:15 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 904FD106566B; Fri, 29 Jan 2010 23:11:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 1F0FA8FC0C; Fri, 29 Jan 2010 23:11:14 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o0TNBAIS088873 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o0TNBA01055788; Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o0TNBAGH055787; Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 30 Jan 2010 01:11:10 +0200 From: Kostik Belousov To: Alexander Motin Message-ID: <20100129231110.GS3877@deviant.kiev.zoral.com.ua> References: <4B636812.8060403@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zhl+qcI0cpCDfCbW" Content-Disposition: inline In-Reply-To: <4B636812.8060403@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 23:11:15 -0000 --zhl+qcI0cpCDfCbW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > Hi. >=20 > Experimenting with SATA hot-plug I've found quite repeatable deadlock > case. Problem observed when several SATA devices, opened via devfs, > disappear at exactly same time. In my case, at time of unplugging SATA > Port Multiplier with several disks beyond it. All I have to do is to run > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug > multiplier. That causes predictable I/O errors and devices destruction. > But with high probability several dd processes getting stuck in kernel. >=20 > I've discovered such pieces of problem: > - CAM receives disconnect event and starts device destruction. But as > device is still opened, it can't do it immediately. > - dd receives I/O error and exits. > - exit1() call closes all descriptors, including adaX device. It > triggers final device destruction, by sending event to geom_dev. >=20 > adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521 > g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4 > g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb > g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425 > devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762 > VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55 > vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750 > vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854 > devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b > _fdrop(4566da48,4569fd80,7b604b8c,408b5cec,0,4569fe24,40eb23a8,40d10460,4= 0c1a8bb,4560672c,721,40c1a8b2,7b604bb4,40878220,4560672c,8,40c1a8b2,721) > at 0x40836da3 > closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0 > fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da > exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423 > sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd > syscall(7b604d38) at 0x40b565c0 >=20 > - GEOM event thread tries to destroy /dev/adaX device (which should be > already free at this moment), but for some reason freezes, waiting for > device to be freed: >=20 > 0 2 0 0 -8 0 0 8 devdrn DL ?? 0:02.89 > [g_event] >=20 > - as GEOM event is still not handled, exit1() waits for it: >=20 > kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909 > g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f > exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431 > sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd > syscall(7b604d38) at 0x40b565c0 >=20 > - system stationary. GEOM frozen. No way to get out of this, except > pushing reset. >=20 > 0 1065 1055 0 44 0 5344 3040 g_wait DE 0 0:00.43 dd > if=3D/dev/ada1 of=3D/dev/null bs=3D1m > 0 1066 1055 0 44 0 5344 3040 GEOM t DE 0 0:00.07 dd > if=3D/dev/ada2 of=3D/dev/null bs=3D1m >=20 >=20 > So, does anybody have good idea why destroy_dev() can't complete? The devdrn state means that thread performing the device destruction, i.e. the thread called destroy_dev(), is waiting for threads to leave the cdevsw d_* methods. The thread that notified the destruction thread did that from d_close() method. This resulted in the deadlock. I introduced destroy_dev_sched(9) KPI to handle this and similar issues. Note that race-free use of destroy_dev_sched(9) is quite hard. --zhl+qcI0cpCDfCbW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktjaw0ACgkQC3+MBN1Mb4g4CgCg5qoXeNLMYgbyuZhwAZYQtX/g F4UAoOF3rYGBwcwwsat2EykHAGqEog0e =Rkef -----END PGP SIGNATURE----- --zhl+qcI0cpCDfCbW--