From owner-freebsd-fs@FreeBSD.ORG Wed Aug 27 21:24:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE6E32E6; Wed, 27 Aug 2014 21:24:54 +0000 (UTC) Received: from caida.org (rommie.caida.org [192.172.226.78]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D77E83E94; Wed, 27 Aug 2014 21:24:54 +0000 (UTC) Message-ID: <53FE4C9F.7030406@caida.org> Date: Wed, 27 Aug 2014 14:24:47 -0700 From: Daniel Andersen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: John Baldwin , freebsd-fs@freebsd.org Subject: Re: Process enters unkillable state and somewhat wedges zfs References: <53F25402.1020907@caida.org> <53F4E3C0.6000406@FreeBSD.org> <53FB716D.1060007@caida.org> <201408271639.09352.jhb@freebsd.org> In-Reply-To: <201408271639.09352.jhb@freebsd.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 21:24:55 -0000 On 08/27/2014 01:39 PM, John Baldwin wrote: > On Monday, August 25, 2014 1:25:01 pm Daniel Andersen wrote: >> On 08/20/2014 11:06 AM, Bryan Drewery wrote: >>> On 8/18/2014 2:29 PM, Daniel Andersen wrote: >>>> We are currently experiencing a strange problem that sort of locks up one > of our zfs pools. This is on a FreeBSD 10 >>>> machine. Let me give a rough layout of our system to better describe > what is happening: >>> >>> When it happens get the output of 'procstat -kka|grep zfs' please. >>> >> >> I'd read something about that while attempting to debug the last few times, > so I happen to have that output, I believe. >> Here are the last two examples ( the latter having had more wedged processes > ): >> >> 0 100687 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 0 101055 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 38 100306 zfskern arc_reclaim_thre mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe >> 38 100307 zfskern l2arc_feed_threa mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe >> 38 100672 zfskern trim tank mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 100688 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 100689 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 38 101054 zfskern trim work mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 101056 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 101057 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 16656 101610 ls - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb >> >> >> and >> >> 0 100688 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 0 101056 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 38 100306 zfskern arc_reclaim_thre mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe >> 38 100307 zfskern l2arc_feed_threa mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe >> 38 100673 zfskern trim tank mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 100689 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 100690 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d zio_wait+0x5b >> vdev_uberblock_sync_list+0xad vdev_config_sync+0x118 spa_sync+0x827 > txg_sync_thread+0x375 fork_exit+0x9a >> fork_trampoline+0xe >> 38 101055 zfskern trim work mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 101057 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 101058 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 2638 101160 nfsd nfsd: service mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_vget+0xfe > nullfs_vget+0x54 nfsrvd_readdirplus+0x905 >> nfsrvd_dorpc+0x773 nfssvc_program+0x4f6 svc_run_internal+0x1f9 > svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe >> 3891 101641 collectd - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 11808 101802 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb >> 13900 101899 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb >> 17961 102536 df - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 38964 102279 ls - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb >> 42315 102539 df - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 42335 101782 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb > > These are all blocked in "zfs" then. (For future reference, the 'mwchan' > field that you see as 'STATE' in top or via 'ps O mwchan' is more detailed > than the 'D' state.) > > To diagnose this further, you would need to see which thread holds the > ZFS vnode lock these threads need. I have some gdb scripts you can use to > do that at www.freebsd.org/~jhb/gdb/. You would want to download 'gdb6*' > files from there and then do this as root: > > # cd /path/to/gdb/files > # kgdb > (kgdb) source gdb6 > (kgdb) sleepchain 42335 > > Where '42335' is the pid of some process stuck in "zfs". > I will keep this in mind the next time the machine wedges. Another data point: the second procstat output I sent was the most recent. All the processes listed there were after the fact. The process that started the entire problem ( this time ) was sudo, and it only has this one entry in procstat: 38003 102797 sudo - Of note, this does not appear to be blocked on zfs in anyway. 'ps' showed it in 'R' state instead of 'D' ( I will be sure to use mwchan in the future. ) It appeared to be pegging an entire CPU core at 100% usage, as well, and was only single threaded. Processes that try to access /tank after that wedge as shown above, while processes that access the same filesystem through the /data/ nullfs mount do not. nfsd appears to be an exception, as we only export nfs mounts via a /tank/foo -> /export/foo nullfs mount. It's possible nfsd is doing something else there, though, I guess. :/ Many thanks, Dan