Date: Sat, 30 Jan 2010 00:58:26 +0200 From: Alexander Motin <mav@FreeBSD.org> To: freebsd-geom@freebsd.org, freebsd-hackers@freebsd.org, FreeBSD-Current <freebsd-current@freebsd.org> Subject: Deadlock between GEOM and devfs device destroy and process exit. Message-ID: <4B636812.8060403@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
Hi. Experimenting with SATA hot-plug I've found quite repeatable deadlock case. Problem observed when several SATA devices, opened via devfs, disappear at exactly same time. In my case, at time of unplugging SATA Port Multiplier with several disks beyond it. All I have to do is to run several `dd if=/dev/adaX of=/dev/null bs=1m &` commands and unplug multiplier. That causes predictable I/O errors and devices destruction. But with high probability several dd processes getting stuck in kernel. I've discovered such pieces of problem: - CAM receives disconnect event and starts device destruction. But as device is still opened, it can't do it immediately. - dd receives I/O error and exits. - exit1() call closes all descriptors, including adaX device. It triggers final device destruction, by sending event to geom_dev. adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521 g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4 g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425 devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762 VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55 vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750 vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854 devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b _fdrop(4566da48,4569fd80,7b604b8c,408b5cec,0,4569fe24,40eb23a8,40d10460,40c1a8bb,4560672c,721,40c1a8b2,7b604bb4,40878220,4560672c,8,40c1a8b2,721) at 0x40836da3 closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0 fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423 sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd syscall(7b604d38) at 0x40b565c0 - GEOM event thread tries to destroy /dev/adaX device (which should be already free at this moment), but for some reason freezes, waiting for device to be freed: 0 2 0 0 -8 0 0 8 devdrn DL ?? 0:02.89 [g_event] - as GEOM event is still not handled, exit1() waits for it: kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909 g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431 sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd syscall(7b604d38) at 0x40b565c0 - system stationary. GEOM frozen. No way to get out of this, except pushing reset. 0 1065 1055 0 44 0 5344 3040 g_wait DE 0 0:00.43 dd if=/dev/ada1 of=/dev/null bs=1m 0 1066 1055 0 44 0 5344 3040 GEOM t DE 0 0:00.07 dd if=/dev/ada2 of=/dev/null bs=1m So, does anybody have good idea why destroy_dev() can't complete? -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B636812.8060403>