From owner-freebsd-fs@FreeBSD.ORG Fri Jul 13 05:34:11 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 76DC2106566C for ; Fri, 13 Jul 2012 05:34:11 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3331B8FC12 for ; Fri, 13 Jul 2012 05:34:11 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q6D5XxlF015784; Thu, 12 Jul 2012 22:34:00 -0700 (PDT) (envelope-from dg@pki2.com) From: Dennis Glatting To: Fabian Keil In-Reply-To: <20120712151541.7f3a6886@fabiankeil.de> References: <1341864787.32803.43.camel@btw.pki2.com> <20120712151541.7f3a6886@fabiankeil.de> Content-Type: text/plain; charset="ISO-8859-1" Date: Thu, 12 Jul 2012 22:33:59 -0700 Message-ID: <1342157639.60708.11.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q6D5XxlF015784 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hanging X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2012 05:34:11 -0000 On Thu, 2012-07-12 at 15:15 +0200, Fabian Keil wrote: > Dennis Glatting wrote: > > > I have a ZFS array of disks where the system simply stops as if forever > > blocked by some IO mutex. This happens often and the following is the > > output of top: > > > > last pid: 6075; load averages: 0.00, 0.00, 0.00 up 0+16:54:41 > > 13:04:10 > > 135 processes: 1 running, 134 sleeping > > CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > > Mem: 47M Active, 24M Inact, 18G Wired, 120M Buf, 44G Free > > Swap: 32G Total, 32G Free > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > > COMMAND > > 2410 root 1 33 0 11992K 2820K zio->i 7 331:25 0.00% > > bzip2 > > 2621 root 1 52 4 28640K 5544K tx->tx 24 245:33 0.00% > > john > > 2624 root 1 48 4 28640K 5544K tx->tx 4 239:08 0.00% > > john > > 2623 root 1 49 4 28640K 5544K tx->tx 7 238:44 0.00% > > john > > Does top continue to run or does it hang as well? > It continues to run. > I believe the locks shown above shouldn't affect already > running applications that don't cause disk traffic. > > > My question is what should I be looking at and how should I look at it? > > There is nothing in the logs or the console, rather the system is > > forever paused and entering commands results in no response (it's as if > > everything is deadlocked). > > If the entered commands actually start you can try sending > SIGINFO with CTRL+T to request a status: > > fk@r500 ~ $zpool status > load: 0.15 cmd: zpool 2698 [spa_namespace_lock] 543.23r 0.00u 0.12s 0% 2908k > > If you can run procstat you can try getting kernel stack traces > for some (or all) processes that should give you a rough idea > of how the lock is reached: > > fk@r500 ~ $procstat -kk $(pgrep zpool) > PID TID COMM TDNAME KSTACK > 2698 100922 zpool - mi_switch+0x196 sleepq_wait+0x42 _sx_xlock_hard+0x525 _sx_xlock+0x75 spa_all_configs+0x5e zfs_ioc_pool_configs+0x29 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x5f9 Xfast_syscall+0xf7 > 2388 100431 zpool - mi_switch+0x196 sleepq_wait+0x42 _sx_xlock_hard+0x525 _sx_xlock+0x75 spa_open_common+0x7a spa_get_stats+0x5b zfs_ioc_pool_stats+0x2c zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x5f9 Xfast_syscall+0xf7 > > If procstat hangs as well you could try executing it in a loop > before the problem occurs. If it stays in the cache it may keep > running after most of the other processes stop responding. > > DTrace can be useful for analyzing locking issues as well: > > fk@r500 ~ $sudo ~/scripts/flowtrace.d zfsdev_ioctl > 0 101057 2012 Jul 12 14:59:42.155 00: Trace in progress. Waiting to enter zfsdev_ioctl. Hit CTRL-C to exit. > 0 100979 2012 Jul 12 14:59:43.933 04: --> zfsdev_ioctl:entry > 0 100979 2012 Jul 12 14:59:43.933 05: --> zfs_secpolicy_none:entry > 0 100979 2012 Jul 12 14:59:43.933 05: <-- zfs_secpolicy_none:return > 0 100979 2012 Jul 12 14:59:43.933 05: --> zfs_ioc_pool_configs:entry > 0 100979 2012 Jul 12 14:59:43.933 06: --> spa_all_configs:entry > 0 100979 2012 Jul 12 14:59:43.933 07: --> nvlist_alloc:entry > 0 100979 2012 Jul 12 14:59:43.933 07: [...] > 0 100979 2012 Jul 12 14:59:43.933 07: <-- nvlist_alloc:return > 0 100979 2012 Jul 12 14:59:43.933 07: --> _sx_xlock:entry > 0 100979 2012 Jul 12 14:59:43.933 08: --> _sx_xlock_hard:entry > 0 100979 2012 Jul 12 14:59:43.933 09: --> sleepq_lock:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> spinlock_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> critical_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 10: <-- critical_enter:return > 0 100979 2012 Jul 12 14:59:43.933 09: <-- sleepq_lock:return > 0 100979 2012 Jul 12 14:59:43.933 09: --> lockstat_nsecs:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> binuptime:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> hpet_get_timecount:entry > 0 100979 2012 Jul 12 14:59:43.933 11: <-- hpet_get_timecount:return > 0 100979 2012 Jul 12 14:59:43.933 10: <-- binuptime:return > 0 100979 2012 Jul 12 14:59:43.933 09: <-- lockstat_nsecs:return > 0 100979 2012 Jul 12 14:59:43.933 09: --> sleepq_add:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> sleepq_lookup:entry > 0 100979 2012 Jul 12 14:59:43.933 10: <-- sleepq_lookup:return > 0 100979 2012 Jul 12 14:59:43.933 10: --> thread_lock_flags_:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> spinlock_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> critical_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 11: <-- critical_enter:return > 0 100979 2012 Jul 12 14:59:43.933 10: <-- thread_lock_flags_:return > 0 100979 2012 Jul 12 14:59:43.933 09: --> spinlock_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> critical_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 10: <-- critical_exit:return > 0 100979 2012 Jul 12 14:59:43.933 09: <-- spinlock_exit:return > 0 100979 2012 Jul 12 14:59:43.933 09: --> sleepq_wait:entry > 0 100979 2012 Jul 12 14:59:43.933 10: --> thread_lock_flags_:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> spinlock_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> critical_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 11: <-- critical_enter:return > 0 100979 2012 Jul 12 14:59:43.933 10: <-- thread_lock_flags_:return > 0 100979 2012 Jul 12 14:59:43.933 10: --> sleepq_switch:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> sched_sleep:entry > 0 100979 2012 Jul 12 14:59:43.933 11: <-- sched_sleep:return > 0 100979 2012 Jul 12 14:59:43.933 11: --> thread_lock_set:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> spinlock_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 12: --> critical_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 12: <-- critical_exit:return > 0 100979 2012 Jul 12 14:59:43.933 11: <-- spinlock_exit:return > 0 100979 2012 Jul 12 14:59:43.933 10: --> mi_switch:entry > 0 100979 2012 Jul 12 14:59:43.933 11: --> rdtsc:entry > 0 100979 2012 Jul 12 14:59:43.933 11: <-- rdtsc:return > 0 100979 2012 Jul 12 14:59:43.933 11: --> sched_switch:entry > 0 100979 2012 Jul 12 14:59:43.933 12: --> sched_pctcpu_update:entry > 0 100979 2012 Jul 12 14:59:43.933 12: <-- sched_pctcpu_update:return > 0 100979 2012 Jul 12 14:59:43.933 12: --> spinlock_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 12: --> critical_enter:entry > 0 100979 2012 Jul 12 14:59:43.933 12: <-- critical_enter:return > 0 100979 2012 Jul 12 14:59:43.933 12: --> thread_lock_block:entry > 0 100979 2012 Jul 12 14:59:43.933 13: --> spinlock_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 14: --> critical_exit:entry > 0 100979 2012 Jul 12 14:59:43.933 14: <-- critical_exit:return > 0 100979 2012 Jul 12 14:59:43.933 13: <-- spinlock_exit:return > 0 100979 2012 Jul 12 14:59:43.933 12: <-- thread_lock_block:return > 0 100979 2012 Jul 12 14:59:43.933 12: --> tdq_load_rem:entry > 0 100979 2012 Jul 12 14:59:43.933 12: <-- tdq_load_rem:return > 0 100979 2012 Jul 12 14:59:43.933 12: --> choosethread:entry > 0 100979 2012 Jul 12 14:59:43.933 13: --> sched_choose:entry > 0 100979 2012 Jul 12 14:59:43.933 14: --> tdq_choose:entry > 0 100979 2012 Jul 12 14:59:43.933 15: --> runq_choose:entry > 0 100979 2012 Jul 12 14:59:43.933 15: <-- runq_choose:return > 0 100979 2012 Jul 12 14:59:43.933 15: --> runq_choose_from:entry > 0 100979 2012 Jul 12 14:59:43.933 15: <-- runq_choose_from:return > 0 100979 2012 Jul 12 14:59:43.933 14: --> runq_choose:entry > 0 100979 2012 Jul 12 14:59:43.933 14: <-- runq_choose:return > 0 100979 2012 Jul 12 14:59:43.933 13: <-- sched_choose:return > 0 100979 2012 Jul 12 14:59:43.933 12: <-- choosethread:return > 0 100979 2012 Jul 12 14:59:43.933 12: --> sched_pctcpu_update:entry > 0 100979 2012 Jul 12 14:59:43.933 12: <-- sched_pctcpu_update:return > > Of course once you know where ZFS is hanging you still have to figure > out the why ... > Thanks! I've made several changes to the volumes and rerunning my stuff now (started an hour ago). Rather than having a single volume of hodgepodge disks I now span two RAIDz pools of similar disks, partly because I was curious and partly because I thought it'd be fun. I also asserted ashift=12 and I have replaced/upgraded some of the disks. I have found, but not run, a bit of Seagate code that forces 4k sectors on the ST32000542AS disks -- they report 512 byte sectors but are really 4k. I also ran Seatools against some of my disks. Array now: iirc# zpool status disk-1 pool: disk-1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM disk-1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da8 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da1 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da9 ONLINE 0 0 0 logs gpt/zil-disk1 ONLINE 0 0 0 cache ada1 ONLINE 0 0 0 errors: No known data errors -- Dennis Glatting