Date: Wed, 27 Aug 2014 14:24:47 -0700 From: Daniel Andersen <dea@caida.org> To: John Baldwin <jhb@freebsd.org>, freebsd-fs@freebsd.org Subject: Re: Process enters unkillable state and somewhat wedges zfs Message-ID: <53FE4C9F.7030406@caida.org> In-Reply-To: <201408271639.09352.jhb@freebsd.org> References: <53F25402.1020907@caida.org> <53F4E3C0.6000406@FreeBSD.org> <53FB716D.1060007@caida.org> <201408271639.09352.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 08/27/2014 01:39 PM, John Baldwin wrote: > On Monday, August 25, 2014 1:25:01 pm Daniel Andersen wrote: >> On 08/20/2014 11:06 AM, Bryan Drewery wrote: >>> On 8/18/2014 2:29 PM, Daniel Andersen wrote: >>>> We are currently experiencing a strange problem that sort of locks up one > of our zfs pools. This is on a FreeBSD 10 >>>> machine. Let me give a rough layout of our system to better describe > what is happening: >>> >>> When it happens get the output of 'procstat -kka|grep zfs' please. >>> >> >> I'd read something about that while attempting to debug the last few times, > so I happen to have that output, I believe. >> Here are the last two examples ( the latter having had more wedged processes > ): >> >> 0 100687 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 0 101055 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 38 100306 zfskern arc_reclaim_thre mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe >> 38 100307 zfskern l2arc_feed_threa mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe >> 38 100672 zfskern trim tank mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 100688 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 100689 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 38 101054 zfskern trim work mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 101056 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 101057 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 16656 101610 ls - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb >> >> >> and >> >> 0 100688 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 0 101056 kernel zfs_vn_rele_task mi_switch+0xde > sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5 >> fork_exit+0x9a fork_trampoline+0xe >> 38 100306 zfskern arc_reclaim_thre mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe >> 38 100307 zfskern l2arc_feed_threa mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe >> 38 100673 zfskern trim tank mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 100689 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 100690 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d zio_wait+0x5b >> vdev_uberblock_sync_list+0xad vdev_config_sync+0x118 spa_sync+0x827 > txg_sync_thread+0x375 fork_exit+0x9a >> fork_trampoline+0xe >> 38 101055 zfskern trim work mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe >> 38 101057 zfskern txg_thread_enter mi_switch+0xde > sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b >> fork_exit+0x9a fork_trampoline+0xe >> 38 101058 zfskern txg_thread_enter mi_switch+0xde > sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d >> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe >> 2638 101160 nfsd nfsd: service mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_vget+0xfe > nullfs_vget+0x54 nfsrvd_readdirplus+0x905 >> nfsrvd_dorpc+0x773 nfssvc_program+0x4f6 svc_run_internal+0x1f9 > svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe >> 3891 101641 collectd - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 11808 101802 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb >> 13900 101899 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb >> 17961 102536 df - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 38964 102279 ls - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb >> 42315 102539 df - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statfs+0x6f >> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb >> 42335 101782 sendsize - mi_switch+0xde > sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950 >> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 > namei+0x504 kern_statat_vnhook+0xa5 >> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb > > These are all blocked in "zfs" then. (For future reference, the 'mwchan' > field that you see as 'STATE' in top or via 'ps O mwchan' is more detailed > than the 'D' state.) > > To diagnose this further, you would need to see which thread holds the > ZFS vnode lock these threads need. I have some gdb scripts you can use to > do that at www.freebsd.org/~jhb/gdb/. You would want to download 'gdb6*' > files from there and then do this as root: > > # cd /path/to/gdb/files > # kgdb > (kgdb) source gdb6 > (kgdb) sleepchain 42335 > > Where '42335' is the pid of some process stuck in "zfs". > I will keep this in mind the next time the machine wedges. Another data point: the second procstat output I sent was the most recent. All the processes listed there were after the fact. The process that started the entire problem ( this time ) was sudo, and it only has this one entry in procstat: 38003 102797 sudo - <running> Of note, this does not appear to be blocked on zfs in anyway. 'ps' showed it in 'R' state instead of 'D' ( I will be sure to use mwchan in the future. ) It appeared to be pegging an entire CPU core at 100% usage, as well, and was only single threaded. Processes that try to access /tank after that wedge as shown above, while processes that access the same filesystem through the /data/ nullfs mount do not. nfsd appears to be an exception, as we only export nfs mounts via a /tank/foo -> /export/foo nullfs mount. It's possible nfsd is doing something else there, though, I guess. :/ Many thanks, Dan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53FE4C9F.7030406>