From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 00:07:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 99B89260 for ; Wed, 16 Jan 2013 00:07:44 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-oa0-f48.google.com (mail-oa0-f48.google.com [209.85.219.48]) by mx1.freebsd.org (Postfix) with ESMTP id 63E69713 for ; Wed, 16 Jan 2013 00:07:44 +0000 (UTC) Received: by mail-oa0-f48.google.com with SMTP id h2so816736oag.35 for ; Tue, 15 Jan 2013 16:07:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=IVTKzE3xkDriny8v4brXDu2cpIUqrxpTT97TyGQjVEc=; b=QCHCzYMMg+WAeYkZCRroLHFeKqjYr1HdJ6AmaVFVrj0w76Ih5nd0DI7VRujZEwEUl6 fThNh9HhDjfxbxsJIzeIrYjcSKJ3W0cctTQGkKzZ0dic5kJ7tGSHCpqis1zEF+DY5FTQ eHT0S116SMwsZ6Jta+5y7KK3sc+RNhS13rHsZRw4Q90Snzj+/+mL1m33rumdw1aSuYTG G91aQS+lHG5tI587zyo/uL0b51S4ZgqdNQeKn5aYdyLTqrpmbeM3MQYn475I5u8i4u2c 1MAeIOiUpQ+kU9nuyYWZeXDeRblaCX42HmdW5l2ExdKN9QUKTSai5zTsUHdooOHDSeC4 NXpg== MIME-Version: 1.0 Received: by 10.182.235.70 with SMTP id uk6mr23274848obc.54.1358294858555; Tue, 15 Jan 2013 16:07:38 -0800 (PST) Received: by 10.76.173.101 with HTTP; Tue, 15 Jan 2013 16:07:38 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Jan 2013 17:07:38 -0700 Message-ID: Subject: Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE] From: "Reed A. Cartwright" To: olivier Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmAQOk86bShZ+ExMD9igtOjThBtIy1ywzwWkit1RMiFqSA8tl1wcgXm4Su7dQLt/gDJYFdW Cc: freebsd-fs@freebsd.org, ken@freebsd.org, "freebsd-stable@freebsd.org" , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 00:07:44 -0000 I don't know if this is relevant or not, but I deadlock was recently fixed in the VFS code: http://svnweb.freebsd.org/base?view=revision&revision=244795 On Tue, Jan 15, 2013 at 12:55 PM, olivier wrote: > Dear All, > Still experiencing the same hangs I reported earlier with 9.1. I've been > running a kernel with WITNESS enabled to provide more information. > > During an occurrence of the hang, running show alllocks gave > > Process 25777 (sysctl) thread 0xfffffe014c5b2920 (102567) > exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff811e34c0) locked @ > /usr/src/sys/dev/usb/usb_transfer.c:3171 > Process 25750 (sshd) thread 0xfffffe015a688000 (104313) > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0bb98) locked @ > /usr/src/sys/kern/uipc_sockbuf.c:148 > Process 24922 (cnid_dbd) thread 0xfffffe0187ac4920 (103597) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0973062488) locked @ > /usr/src/sys/kern/vfs_syscalls.c:3591 > Process 24117 (sshd) thread 0xfffffe07bd914490 (104195) > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0a8f0) locked @ > /usr/src/sys/kern/uipc_sockbuf.c:148 > Process 1243 (java) thread 0xfffffe01ca85d000 (102704) > exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe015aec1440) locked @ > /usr/src/sys/amd64/amd64/pmap.c:4840 > exclusive rw pmap pv global (pmap pv global) r = 0 (0xffffffff81409780) > locked @ /usr/src/sys/amd64/amd64/pmap.c:4802 > exclusive sleep mutex vm page (vm page) r = 0 (0xffffffff813f0a80) locked @ > /usr/src/sys/vm/vm_object.c:1128 > exclusive sleep mutex vm object (standard object) r = 0 > (0xfffffe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076 > shared sx vm map (user) (vm map (user)) r = 0 (0xfffffe015aec1388) locked @ > /usr/src/sys/vm/vm_map.c:2045 > Process 994 (nfsd) thread 0xfffffe015a0df000 (102426) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0c3b505878) locked @ > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > Process 994 (nfsd) thread 0xfffffe015a0f8490 (102422) > exclusive lockmgr zfs (zfs) r = 0 (0xfffffe02db3b3e60) locked @ > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > Process 931 (syslogd) thread 0xfffffe015af18920 (102365) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0141dd6680) locked @ > /usr/src/sys/kern/vfs_syscalls.c:3591 > Process 22 (syncer) thread 0xfffffe0125077000 (100279) > exclusive lockmgr syncer (syncer) r = 0 (0xfffffe015a2ff680) locked @ > /usr/src/sys/kern/vfs_subr.c:1809 > > I don't have full "show lockedvnods" output because the output does not get > captured by ddb after using "capture on", it doesn't fit on a single > screen, and doesn't get piped into a "more" equivalent. What I did manage > to get (copied by hand, typos possible) is: > > 0xfffffe0c3b5057e0: 0xfffffe0c3b5057e0: tag zfs, type VREG > tag zfs, type VREG > usecount 1, writecount 0, refcount 1 mountedhere 0 > usecount 1, writecount 0, refcount 1 mountedhere 0 > flags (VI_ACTIVE) > flags (VI_ACTIVE) > v_object 0xfffffe089bc1b828 ref 0 pages 0 > v_object 0xfffffe089bc1b828 ref 0 pages 0 > lock type zfs: SHARED (count 1) > lock type zfs: SHARED (count 1) > > 0xfffffe02db3b3dc8: 0xfffffe02db3b3dc8: tag zfs, type VREG > tag zfs, type VREG > usecount 6, writecount 0, refcount 6 mountedhere 0 > usecount 6, writecount 0, refcount 6 mountedhere 0 > flags (VI_ACTIVE) > flags (VI_ACTIVE) > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > with exclusive waiters pending > with exclusive waiters pending > > The output of show witness is at http://pastebin.com/eSRb3FEu > > The output of alltrace is at http://pastebin.com/X1LruNrf (a number of > threads are stuck in zio_wait, none I can find in zio_interrupt, and > according to gstat and disks eventually going to sleep all disk IO seems to > be stuck for good; I think Andriy explained earlier that these criteria > might indicate this is a ZFS hang). > > The output of show geom is at http://pastebin.com/6nwQbKr4 > > The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are > occurring at a normal rate during the hang, as far as I can tell. > > Any help would be greatly appreciated. > Thanks > Olivier > PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci > from 9.0 (in the hope it would fix the hangs I was experiencing in plain > 9-STABLE; obviously the hangs are still occurring). The rest of my > configuration is the same as posted earlier. > > On Mon, Dec 24, 2012 at 9:42 PM, olivier wrote: > >> Dear All >> It turns out that reverting to an older version of the mps driver did not >> fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all >> (they just took a bit longer to occur again, possibly just by chance). I >> followed steps along lines suggested by Andriy to collect more information >> when the problem occurs. Hopefully this will help figure out what's going >> on. >> >> As far as I can tell, what happens is that at some point IO operations to >> a bunch of drives that belong to different pools get stuck. For these >> drives, gstat shows no activity but 1 pending operation, as such: >> >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da1 >> >> I've been running gstat in a loop (every 100s) to monitor the machine. >> Just before the hang occurs, everything seems fine (see full gstat output >> below). Right after the hang occurs a number of drives seem stuck (see full >> gstat output below). Notably, some stuck drives are seen through the mps >> driver and others through the mpt driver. So the problem doesn't seem to be >> driver-specific. I have had the problem occur (at a lower frequency) on >> similar machines that don't use the mpt driver (and only have 1 disk >> provided through mps), so the problem doesn't seem to be caused by the mpt >> driver (and is likely not caused by defective hardware). Since based on the >> information I provided earlier Andriy thinks the problem might not >> originate in ZFS, perhaps that means that the problem is in the CAM layer? >> >> camcontrol tags -v (as suggested by Andriy) in the hung state shows for >> example >> >> (pass56:mpt1:0:8:20): dev_openings 254 >> (pass56:mpt1:0:8:20): dev_active 1 >> (pass56:mpt1:0:8:20): devq_openings 254 >> (pass56:mpt1:0:8:20): devq_queued 0 >> (pass56:mpt1:0:8:20): held 0 >> (pass56:mpt1:0:8:20): mintags 2 >> (pass56:mpt1:0:8:20): maxtags 255 >> (I'm not providing full camcontrol tags output below because I couldn't >> get it to run during the specific hang I documented most thoroughly; the >> example above is from a different occurrence of the hang). >> >> The buses don't seem completely frozen: if I manually remove drives while >> the machine is hanging, that's picked up by the mpt driver, which prints >> out corresponding messages to the console. But camcontrol reset all or >> rescan all don't seem to do anything. >> >> I've tried reducing vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending >> to 1, to no avail. >> >> Any suggestions to resolve this problem, work around it, or further >> investigate it would be greatly appreciated! >> Thanks a lot >> Olivier >> >> Detailed information: >> >> Output of procstat -a -kk when the machine is hanging is available at >> http://pastebin.com/7D2KtT35 (not putting it here because it's pretty >> long) >> >> dmesg is available at http://pastebin.com/9zJQwWJG . Note that I'm using >> LUN masking, so the "illegal requests" reported aren't really errors. Maybe >> one day if I get my problems sorted out I'll use geom multipathing instead. >> >> My kernel config is >> include GENERIC >> ident MYKERNEL >> >> options IPSEC >> device crypto >> >> options OFED # Infiniband protocol >> >> device mlx4ib # ConnectX Infiniband support >> device mlxen # ConnectX Ethernet support >> device mthca # Infinihost cards >> device ipoib # IP over IB devices >> >> options ATA_CAM # Handle legacy controllers with CAM >> options ATA_STATIC_ID # Static device numbering >> >> options KDB >> options DDB >> >> >> >> Full output of gstat just before the hang (at most 100s before the hang): >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da2/da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da0/da0 >> 1 85 48 79 4.7 35 84 0.5 0 0 >> 0.0 24.3 da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da1/da1 >> 1 83 47 77 4.3 34 79 0.5 0 0 >> 0.0 22.1 da4 >> 1 1324 1303 21433 0.6 19 42 0.7 0 0 >> 0.0 79.8 da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da19 >> 0 97 57 93 3.5 38 84 0.3 0 0 >> 0.0 21.3 da20 >> 0 85 47 69 3.3 36 86 0.4 0 0 >> 0.0 16.8 da21 >> 0 1666 1641 18992 0.3 23 43 0.4 0 0 >> 0.0 57.9 da22 >> 0 93 55 98 3.5 36 87 0.4 0 0 >> 0.0 20.6 da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da32 >> 0 1200 0 0 0.0 1198 11751 0.6 0 0 >> 0.0 67.3 da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da36 >> 0 81 44 67 2.0 35 84 0.3 0 0 >> 0.0 10.1 da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da42 >> 1 1020 999 22028 0.8 19 42 0.7 0 0 >> 0.0 84.8 da43 >> 0 1050 1029 23479 0.8 19 47 0.7 0 0 >> 0.0 83.3 da44 >> 1 1006 984 22758 0.8 21 46 0.6 0 0 >> 0.0 84.8 da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da4/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da3/da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da5/da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da6/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da7/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da8/da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da9/da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da10/da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da11/da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da12/da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da13/da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da14/da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da15/da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da16/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da17/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da18/da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da19/da19 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da20/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da21/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da22/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da23/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da24/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da25/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 PART/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da27/da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da28/da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da29/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da30/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da31/da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da32/da32 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da33/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da34/da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da35/da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da36/da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da37/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da38/da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da39/da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da40/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da41/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da42/da42 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da43/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da44/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da45/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da46/da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da47/da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da48/da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da49/da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da50/da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/cd0/cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p3/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 >> >> >> Full output of gstat just after the hang (at most 100s after the hang): >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da2/da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da0/da0 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da1/da1 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da4 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da5 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da15 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da16 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da19 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da20 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da22 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da23 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da25 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da32 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da42 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da43 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da44 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da4/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da3/da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da5/da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da6/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da7/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da8/da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da9/da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da10/da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da11/da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da12/da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da13/da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da14/da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da15/da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da16/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da17/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da18/da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da19/da19 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da20/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da21/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da22/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da23/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da24/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da25/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 PART/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p2 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da27/da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da28/da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da29/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da30/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da31/da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da32/da32 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da33/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da34/da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da35/da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da36/da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da37/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da38/da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da39/da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da40/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da41/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da42/da42 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da43/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da44/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da45/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da46/da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da47/da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da48/da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da49/da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da50/da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/cd0/cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p3/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 >> >> >> On Thu, Dec 13, 2012 at 10:14 PM, olivier wrote: >> >>> For what it's worth, I think I might have solved my problem by reverting >>> to an older version of the mps driver. I checked out a recent version of >>> 9-STABLE and reversed the changes in >>> http://svnweb.freebsd.org/base?view=revision&revision=230592 (perhaps >>> there was a simpler way of reverting to the older mps driver). So far so >>> good, no hang even when hammering the file system. >>> >>> This does not conclusively prove that the new LSI mps driver is at fault, >>> but that seems to be a likely explanation. >>> >>> Thanks to everybody who pointed me in the right direction. Hope this >>> helps others who run into similar problems with 9.1 >>> Olivier >>> >>> >>> On Thu, Dec 13, 2012 at 10:14 AM, olivier wrote: >>> >>>> >>>> >>>> On Thu, Dec 13, 2012 at 9:54 AM, Andriy Gapon wrote: >>>> >>>>> Google for "zfs deadman". This is already committed upstream and I >>>>> think that it >>>>> is imported into FreeBSD, but I am not sure... Maybe it's imported >>>>> just into the >>>>> vendor area and is not merged yet. >>>>> >>>> >>>> Yes, that's exactly what I had in mind. The logic for panicking makes >>>> sense. >>>> As far as I can tell you're correct that deadman is in the vendor area >>>> but not merged. Any idea when it might make it into 9-STABLE? >>>> Thanks >>>> Olivier >>>> >>>> >>>> >>>> >>>>> So, when enabled this logic would panic a system as a way of letting >>>>> know that >>>>> something is wrong. You can read in the links why panic was selected >>>>> for this job. >>>>> >>>>> And speaking FreeBSD-centric - I think that our CAM layer would be a >>>>> perfect place >>>>> to detect such issues in non-ZFS-specific way. >>>>> >>>>> -- >>>>> Andriy Gapon >>>>> >>>> >>>> >>> >> > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University