Date: Sun, 10 Oct 2010 12:47:50 +0300 From: Andriy Gapon <avg@freebsd.org> To: Kai Gallasch <gallasch@free.de> Cc: freebsd-fs@freebsd.org, Konstantin Belousov <kib@freebsd.org> Subject: Re: Locked up processes after upgrade to ZFS v15 Message-ID: <4CB18BC6.70305@freebsd.org> In-Reply-To: <4CAF45A8.3020401@icyb.net.ua> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <E5332812-379B-4EC1-A134-12176C718B2E@free.de> <4CAF45A8.3020401@icyb.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
on 08/10/2010 19:24 Andriy Gapon said the following: > on 06/10/2010 21:51 Kai Gallasch said the following: >> >> Am 06.10.2010 um 19:32 schrieb Martin Simmons: >> >>>>>>>> On Wed, 6 Oct 2010 14:28:31 +0200, Kai Gallasch said: >>>> >>>> How can I debug this and get further information? >>> >>> procstat -k -k $pid will generate a backtrace (or replace $pid by -a for all >>> processes). >> >> procstat for process 12111 (state: zfs) >> sonnenkraft:~ # procstat -k -k 12111 >> PID TID COMM TDNAME KSTACK >> 12111 102385 httpd - mi_switch+0x21b sleepq_switch+0x123 sleepq_wait+0x4d __lockmgr_args+0x7ae vop_stdlock+0x39 VOP_LOCK1_APV+0x9b _vn_lock+0x57 vget+0x7b cache_lookup+0x4e0 vfs_cache_lookup+0xc0 VOP_LOOKUP_APV+0xb7 lookup+0x3d3 namei+0x457 vn_open_cred+0x1e3 kern_openat+0x181 syscall+0x102 Xfast_syscall+0xe2 >> >> procstat for process 24731 (state: zfsmrb) >> # procstat -k -k 24731 >> PID TID COMM TDNAME KSTACK >> 24731 102273 httpd - mi_switch+0x21b sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x369 zfs_freebsd_read+0x2a6 VOP_READ_APV+0xaf vnode_pager_generic_getpages+0x3ea VOP_GETPAGES_APV+0xb5 vnode_pager_getpages+0x8c vm_fault+0x685 trap_pfault+0x128 trap+0x52c calltrap+0x8 Hm, I think that we actually shouldn't see a stack like that. vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpages, so the thread gets stuck forever in zfs mappedread. It seems like the page that was seen as invalid in vm_fault becomes valid while call flow reaches mappedread. >> In my original post I wrote that only apache httpd processes would lock up.. >> This is wrong. Several other non-httpd processes also got stuck in state zfs or zfsmrb. > > Interesting. > It's possible that TID 102385 might be waiting on a vnode lock held by TID 102273. > But TID 102273 seems to be waiting on a vnode's page lock. > It would be very interesting to learn what process has that page busy, for how > long and why. > Perhaps there is a code path that busies a page, but never un-busies it... > -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CB18BC6.70305>