From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 19:55:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4156A7F1; Tue, 15 Jan 2013 19:55:25 +0000 (UTC) (envelope-from olivier777a7@gmail.com) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by mx1.freebsd.org (Postfix) with ESMTP id C1AB425A; Tue, 15 Jan 2013 19:55:23 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id fs13so564618lab.23 for ; Tue, 15 Jan 2013 11:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=gmPWyccEvg/1qMgT5YgggOnZQo4WIg59D1YBCCOPe+o=; b=vpkwX7gqVtnLLzPliWhTCcB3w+ZU3nmvi3WYbf1zir0TV3qhi4OJQyBm+kPDZtOQXe Xv8nFqOhSdLSzndKyqwSE2uGTFYxa6A5D99UQ/qdAWavYCrqyUpBwU4jvNNpcdhhyAur CiZ7WWPIjf3E62wDbEzUlswE5M6M8UuA4DErKIDFP1ESr75u/FWodd6TSjg/9iE3OUS6 OWMvfiqraS4s8rTrzetrq6mt81DQjGiLlAxtJ8UMKlPqp963T5J5Qb9Pe74xgeOQSQcg Vson8t14ry3iZTzdrXRIBvhdPtv5yOu//aXGFevqNkv3lSwFuRRhuCreDRU0nA5vnNLt cpZg== MIME-Version: 1.0 Received: by 10.152.145.37 with SMTP id sr5mr30198611lab.33.1358279722551; Tue, 15 Jan 2013 11:55:22 -0800 (PST) Received: by 10.114.78.41 with HTTP; Tue, 15 Jan 2013 11:55:22 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Jan 2013 11:55:22 -0800 Message-ID: Subject: Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE] From: olivier To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: ken@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 19:55:25 -0000 Dear All, Still experiencing the same hangs I reported earlier with 9.1. I've been running a kernel with WITNESS enabled to provide more information. During an occurrence of the hang, running show alllocks gave Process 25777 (sysctl) thread 0xfffffe014c5b2920 (102567) exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff811e34c0) locked @ /usr/src/sys/dev/usb/usb_transfer.c:3171 Process 25750 (sshd) thread 0xfffffe015a688000 (104313) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0bb98) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 24922 (cnid_dbd) thread 0xfffffe0187ac4920 (103597) shared lockmgr zfs (zfs) r = 0 (0xfffffe0973062488) locked @ /usr/src/sys/kern/vfs_syscalls.c:3591 Process 24117 (sshd) thread 0xfffffe07bd914490 (104195) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0a8f0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1243 (java) thread 0xfffffe01ca85d000 (102704) exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe015aec1440) locked @ /usr/src/sys/amd64/amd64/pmap.c:4840 exclusive rw pmap pv global (pmap pv global) r = 0 (0xffffffff81409780) locked @ /usr/src/sys/amd64/amd64/pmap.c:4802 exclusive sleep mutex vm page (vm page) r = 0 (0xffffffff813f0a80) locked @ /usr/src/sys/vm/vm_object.c:1128 exclusive sleep mutex vm object (standard object) r = 0 (0xfffffe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076 shared sx vm map (user) (vm map (user)) r = 0 (0xfffffe015aec1388) locked @ /usr/src/sys/vm/vm_map.c:2045 Process 994 (nfsd) thread 0xfffffe015a0df000 (102426) shared lockmgr zfs (zfs) r = 0 (0xfffffe0c3b505878) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 Process 994 (nfsd) thread 0xfffffe015a0f8490 (102422) exclusive lockmgr zfs (zfs) r = 0 (0xfffffe02db3b3e60) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 Process 931 (syslogd) thread 0xfffffe015af18920 (102365) shared lockmgr zfs (zfs) r = 0 (0xfffffe0141dd6680) locked @ /usr/src/sys/kern/vfs_syscalls.c:3591 Process 22 (syncer) thread 0xfffffe0125077000 (100279) exclusive lockmgr syncer (syncer) r = 0 (0xfffffe015a2ff680) locked @ /usr/src/sys/kern/vfs_subr.c:1809 I don't have full "show lockedvnods" output because the output does not get captured by ddb after using "capture on", it doesn't fit on a single screen, and doesn't get piped into a "more" equivalent. What I did manage to get (copied by hand, typos possible) is: 0xfffffe0c3b5057e0: 0xfffffe0c3b5057e0: tag zfs, type VREG tag zfs, type VREG usecount 1, writecount 0, refcount 1 mountedhere 0 usecount 1, writecount 0, refcount 1 mountedhere 0 flags (VI_ACTIVE) flags (VI_ACTIVE) v_object 0xfffffe089bc1b828 ref 0 pages 0 v_object 0xfffffe089bc1b828 ref 0 pages 0 lock type zfs: SHARED (count 1) lock type zfs: SHARED (count 1) 0xfffffe02db3b3dc8: 0xfffffe02db3b3dc8: tag zfs, type VREG tag zfs, type VREG usecount 6, writecount 0, refcount 6 mountedhere 0 usecount 6, writecount 0, refcount 6 mountedhere 0 flags (VI_ACTIVE) flags (VI_ACTIVE) v_object 0xfffffe0b79583ae0 ref 0 pages 0 v_object 0xfffffe0b79583ae0 ref 0 pages 0 lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) with exclusive waiters pending with exclusive waiters pending The output of show witness is at http://pastebin.com/eSRb3FEu The output of alltrace is at http://pastebin.com/X1LruNrf (a number of threads are stuck in zio_wait, none I can find in zio_interrupt, and according to gstat and disks eventually going to sleep all disk IO seems to be stuck for good; I think Andriy explained earlier that these criteria might indicate this is a ZFS hang). The output of show geom is at http://pastebin.com/6nwQbKr4 The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are occurring at a normal rate during the hang, as far as I can tell. Any help would be greatly appreciated. Thanks Olivier PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci from 9.0 (in the hope it would fix the hangs I was experiencing in plain 9-STABLE; obviously the hangs are still occurring). The rest of my configuration is the same as posted earlier. On Mon, Dec 24, 2012 at 9:42 PM, olivier wrote: > Dear All > It turns out that reverting to an older version of the mps driver did not > fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all > (they just took a bit longer to occur again, possibly just by chance). I > followed steps along lines suggested by Andriy to collect more information > when the problem occurs. Hopefully this will help figure out what's going > on. > > As far as I can tell, what happens is that at some point IO operations to > a bunch of drives that belong to different pools get stuck. For these > drives, gstat shows no activity but 1 pending operation, as such: > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da1 > > I've been running gstat in a loop (every 100s) to monitor the machine. > Just before the hang occurs, everything seems fine (see full gstat output > below). Right after the hang occurs a number of drives seem stuck (see full > gstat output below). Notably, some stuck drives are seen through the mps > driver and others through the mpt driver. So the problem doesn't seem to be > driver-specific. I have had the problem occur (at a lower frequency) on > similar machines that don't use the mpt driver (and only have 1 disk > provided through mps), so the problem doesn't seem to be caused by the mpt > driver (and is likely not caused by defective hardware). Since based on the > information I provided earlier Andriy thinks the problem might not > originate in ZFS, perhaps that means that the problem is in the CAM layer? > > camcontrol tags -v (as suggested by Andriy) in the hung state shows for > example > > (pass56:mpt1:0:8:20): dev_openings 254 > (pass56:mpt1:0:8:20): dev_active 1 > (pass56:mpt1:0:8:20): devq_openings 254 > (pass56:mpt1:0:8:20): devq_queued 0 > (pass56:mpt1:0:8:20): held 0 > (pass56:mpt1:0:8:20): mintags 2 > (pass56:mpt1:0:8:20): maxtags 255 > (I'm not providing full camcontrol tags output below because I couldn't > get it to run during the specific hang I documented most thoroughly; the > example above is from a different occurrence of the hang). > > The buses don't seem completely frozen: if I manually remove drives while > the machine is hanging, that's picked up by the mpt driver, which prints > out corresponding messages to the console. But camcontrol reset all or > rescan all don't seem to do anything. > > I've tried reducing vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending > to 1, to no avail. > > Any suggestions to resolve this problem, work around it, or further > investigate it would be greatly appreciated! > Thanks a lot > Olivier > > Detailed information: > > Output of procstat -a -kk when the machine is hanging is available at > http://pastebin.com/7D2KtT35 (not putting it here because it's pretty > long) > > dmesg is available at http://pastebin.com/9zJQwWJG . Note that I'm using > LUN masking, so the "illegal requests" reported aren't really errors. Maybe > one day if I get my problems sorted out I'll use geom multipathing instead. > > My kernel config is > include GENERIC > ident MYKERNEL > > options IPSEC > device crypto > > options OFED # Infiniband protocol > > device mlx4ib # ConnectX Infiniband support > device mlxen # ConnectX Ethernet support > device mthca # Infinihost cards > device ipoib # IP over IB devices > > options ATA_CAM # Handle legacy controllers with CAM > options ATA_STATIC_ID # Static device numbering > > options KDB > options DDB > > > > Full output of gstat just before the hang (at most 100s before the hang): > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da2/da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da0/da0 > 1 85 48 79 4.7 35 84 0.5 0 0 > 0.0 24.3 da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da1/da1 > 1 83 47 77 4.3 34 79 0.5 0 0 > 0.0 22.1 da4 > 1 1324 1303 21433 0.6 19 42 0.7 0 0 > 0.0 79.8 da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da19 > 0 97 57 93 3.5 38 84 0.3 0 0 > 0.0 21.3 da20 > 0 85 47 69 3.3 36 86 0.4 0 0 > 0.0 16.8 da21 > 0 1666 1641 18992 0.3 23 43 0.4 0 0 > 0.0 57.9 da22 > 0 93 55 98 3.5 36 87 0.4 0 0 > 0.0 20.6 da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da32 > 0 1200 0 0 0.0 1198 11751 0.6 0 0 > 0.0 67.3 da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da36 > 0 81 44 67 2.0 35 84 0.3 0 0 > 0.0 10.1 da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da42 > 1 1020 999 22028 0.8 19 42 0.7 0 0 > 0.0 84.8 da43 > 0 1050 1029 23479 0.8 19 47 0.7 0 0 > 0.0 83.3 da44 > 1 1006 984 22758 0.8 21 46 0.6 0 0 > 0.0 84.8 da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da4/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da3/da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da5/da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da6/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da7/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da8/da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da9/da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da10/da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da11/da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da12/da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da13/da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da14/da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da15/da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da16/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da17/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da18/da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da19/da19 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da20/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da21/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da22/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da23/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da24/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da25/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 PART/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da27/da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da28/da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da29/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da30/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da31/da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da32/da32 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da33/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da34/da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da35/da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da36/da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da37/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da38/da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da39/da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da40/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da41/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da42/da42 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da43/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da44/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da45/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da46/da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da47/da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da48/da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da49/da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da50/da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/cd0/cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p3/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > > > Full output of gstat just after the hang (at most 100s after the hang): > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da2/da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da0/da0 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da1/da1 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da4 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da5 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da15 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da16 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da19 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da20 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da22 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da23 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da25 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da32 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da42 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da43 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da44 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da4/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da3/da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da5/da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da6/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da7/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da8/da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da9/da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da10/da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da11/da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da12/da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da13/da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da14/da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da15/da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da16/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da17/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da18/da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da19/da19 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da20/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da21/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da22/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da23/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da24/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da25/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 PART/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p2 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da27/da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da28/da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da29/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da30/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da31/da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da32/da32 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da33/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da34/da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da35/da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da36/da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da37/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da38/da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da39/da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da40/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da41/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da42/da42 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da43/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da44/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da45/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da46/da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da47/da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da48/da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da49/da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da50/da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/cd0/cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p3/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > > > On Thu, Dec 13, 2012 at 10:14 PM, olivier wrote: > >> For what it's worth, I think I might have solved my problem by reverting >> to an older version of the mps driver. I checked out a recent version of >> 9-STABLE and reversed the changes in >> http://svnweb.freebsd.org/base?view=revision&revision=230592 (perhaps >> there was a simpler way of reverting to the older mps driver). So far so >> good, no hang even when hammering the file system. >> >> This does not conclusively prove that the new LSI mps driver is at fault, >> but that seems to be a likely explanation. >> >> Thanks to everybody who pointed me in the right direction. Hope this >> helps others who run into similar problems with 9.1 >> Olivier >> >> >> On Thu, Dec 13, 2012 at 10:14 AM, olivier wrote: >> >>> >>> >>> On Thu, Dec 13, 2012 at 9:54 AM, Andriy Gapon wrote: >>> >>>> Google for "zfs deadman". This is already committed upstream and I >>>> think that it >>>> is imported into FreeBSD, but I am not sure... Maybe it's imported >>>> just into the >>>> vendor area and is not merged yet. >>>> >>> >>> Yes, that's exactly what I had in mind. The logic for panicking makes >>> sense. >>> As far as I can tell you're correct that deadman is in the vendor area >>> but not merged. Any idea when it might make it into 9-STABLE? >>> Thanks >>> Olivier >>> >>> >>> >>> >>>> So, when enabled this logic would panic a system as a way of letting >>>> know that >>>> something is wrong. You can read in the links why panic was selected >>>> for this job. >>>> >>>> And speaking FreeBSD-centric - I think that our CAM layer would be a >>>> perfect place >>>> to detect such issues in non-ZFS-specific way. >>>> >>>> -- >>>> Andriy Gapon >>>> >>> >>> >> >