From owner-freebsd-fs@FreeBSD.ORG Mon Jul 9 20:13:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3F3D11065675 for ; Mon, 9 Jul 2012 20:13:17 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id D59018FC22 for ; Mon, 9 Jul 2012 20:13:16 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q69KD7uH010060 for ; Mon, 9 Jul 2012 13:13:07 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: freebsd-fs@freebsd.org Date: Mon, 09 Jul 2012 13:13:07 -0700 Message-ID: <1341864787.32803.43.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q69KD7uH010060 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS hanging X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2012 20:13:17 -0000 I have a ZFS array of disks where the system simply stops as if forever blocked by some IO mutex. This happens often and the following is the output of top: last pid: 6075; load averages: 0.00, 0.00, 0.00 up 0+16:54:41 13:04:10 135 processes: 1 running, 134 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free Swap: 32G Total, 32G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 2410 root 1 33 0 11992K 2820K zio->i 7 331:25 0.00% bzip2 2621 root 1 52 4 28640K 5544K tx->tx 24 245:33 0.00% john 2624 root 1 48 4 28640K 5544K tx->tx 4 239:08 0.00% john 2623 root 1 49 4 28640K 5544K tx->tx 7 238:44 0.00% john 2640 root 1 42 4 28640K 5420K tx->tx 23 206:51 0.00% john 2638 root 1 42 4 28640K 5420K tx->tx 28 206:34 0.00% john 2639 root 1 42 4 28640K 5420K tx->tx 9 206:30 0.00% john 2637 root 1 42 4 28640K 5420K tx->tx 18 206:24 0.00% john This system is presently resilvering a disk but these stops have happened before. iirc# zpool status disk-1 pool: disk-1 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sun Jul 8 13:07:46 2012 104G scanned out of 12.4T at 1.73M/s, (scan is slow, no estimated time) 10.3G resilvered, 0.82% done config: NAME STATE READ WRITE CKSUM disk-1 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da10 ONLINE 0 0 0 da9 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 replacing-7 DEGRADED 0 0 0 17938531774236227186 UNAVAIL 0 0 0 was /dev/da8 da3 ONLINE 0 0 0 (resilvering) da8 ONLINE 0 0 0 da4 ONLINE 0 0 0 logs ada2p1 ONLINE 0 0 0 cache ada1 ONLINE 0 0 0 errors: No known data errors This system has dissimilar disks, which I understand should not be a problem but the stopping also happened before I started the slow disk upgrade process. The disks are served by: * A LSI 9211 flashed to IT, and * A LSI 2008 controller on the motherboard also flashed to IT. The 2008 BIOS and firmware is the most recent from LSI. The motherboard is a Supermicro H8DG6-F. My question is what should I be looking at and how should I look at it? There is nothing in the logs or the console, rather the system is forever paused and entering commands results in no response (it's as if everything is deadlocked).