From owner-freebsd-current@FreeBSD.ORG Sun Dec 6 11:32:02 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CB8DC106566B for ; Sun, 6 Dec 2009 11:32:02 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 105188FC08 for ; Sun, 6 Dec 2009 11:32:01 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA12432 for ; Sun, 06 Dec 2009 13:32:00 +0200 (EET) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1NHFLM-0001Q5-A7 for freebsd-current@FreeBSD.org; Sun, 06 Dec 2009 13:32:00 +0200 Message-ID: <4B1B9600.4080709@icyb.net.ua> Date: Sun, 06 Dec 2009 13:31:12 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.23 (X11/20091128) MIME-Version: 1.0 To: freebsd-current@FreeBSD.org X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: process stuck in stat/../cache_lookup: ktorrent, zfs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Dec 2009 11:32:02 -0000 System is recent 9-current, amd64. I see that sometimes ktorrent gets stuck during heavy download (multiple files in parallel, high speed). It is completely unresponsive and not killable even with SIGKILL. Here's procstat output for it (many thanks to Robert again): $ procstat -k 1882 PID TID COMM TDNAME KSTACK 1882 100270 ktorrent - mi_switch sleepq_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait poll syscall Xfast_syscall 1882 100277 ktorrent - mi_switch sleepq_switch sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV _vn_lock vget cache_lookup vfs_cache_lookup VOP_LOOKUP_APV lookup namei kern_statat_vnhook kern_statat kern_stat stat syscall 1882 100279 ktorrent - mi_switch sleepq_switch sleepq_catch_signals sleepq_timedwait_sig _cv_timedwait_sig seltdwait kern_select select syscall Xfast_syscall 1882 100283 ktorrent - mi_switch sleepq_switch sleepq_catch_signals sleepq_wait_sig _sleep do_cv_wait __umtx_op_cv_wait _umtx_op syscall Xfast_syscall 1882 100284 ktorrent - mi_switch sleepq_switch sleepq_catch_signals sleepq_wait_sig _sleep do_cv_wait __umtx_op_cv_wait _umtx_op syscall Xfast_syscall ktrace shows that threads 100270 and 100279 are still live. I think that it is a thread doing stat that is stuck and holding other threads. Here's info for that thread with -kk: 1882 100277 ktorrent - mi_switch+0x1cf sleepq_switch+0xdb sleepq_wait+0x45 __lockmgr_args+0x3d4 vop_stdlock+0x51 VOP_LOCK1_APV+0x82 _vn_lock+0x48 vget+0x88 cache_lookup+0x8e3 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0x7c lookup+0x64a namei+0x68f kern_statat_vnhook+0x97 kern_statat+0x15 kern_stat+0x1c stat+0x2d syscall+0x3f6 Even better, kgdb on /dev/mem: (kgdb) thr 255 [Switching to thread 255 (Thread 100277)]#0 sched_switch (td=0xffffff012a6c5700, newtd=0xffffff0001533380, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1865 1865 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xffffff012a6c5700, newtd=0xffffff0001533380, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1865 #1 0xffffffff80374baf in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:449 #2 0xffffffff803a795b in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:509 #3 0xffffffff803a8645 in sleepq_wait (wchan=0xffffff0105b457f8, pri=80) at /usr/src/sys/kern/subr_sleepqueue.c:588 #4 0xffffffff80351184 in __lockmgr_args (lk=0xffffff0105b457f8, flags=2097408, ilk=0xffffff0105b45820, wmesg=Variable "wmesg" is not available. ) at /usr/src/sys/kern/kern_lock.c:216 #5 0xffffffff803ea171 in vop_stdfsync (ap=0x0) at /usr/src/sys/kern/vfs_default.c:587 #6 0xffffffff8058b582 in VOP_LOCK1_APV (vop=Variable "vop" is not available. ) at vnode_if.c:1994 #7 0xffffffff80408318 in _vn_lock (vp=0xffffff0105b45760, flags=2097408, file=0xffffff0000000823 "", line=-2141062687) at vnode_if.h:880 #8 0xffffffff803fb798 in vget (vp=0xffffff0105b45760, flags=2097408, td=0xffffff012a6c5700) at /usr/src/sys/kern/vfs_subr.c:2089 #9 0xffffffff803e7953 in cache_lookup (dvp=0xffffff0055877b10, vpp=0xffffff8085155960, cnp=0xffffff8085155988) at /usr/src/sys/kern/vfs_cache.c:568 #10 0xffffffff803e7b3d in vfs_cache_lookup (ap=Variable "ap" is not available. ) at vnode_if.h:80 #11 0xffffffff8058c08c in VOP_LOOKUP_APV (vop=Variable "vop" is not available. ) at vnode_if.c:129 #12 0xffffffff803eeeaa in lookup (ndp=0xffffff8085155930) at /usr/src/sys/kern/vfs_lookup.c:697 #13 0xffffffff803efeaf in namei (ndp=0xffffff8085155930) at /usr/src/sys/kern/vfs_lookup.c:276 #14 0xffffffff803ffa47 in kern_statat_vnhook (td=0xffffff012a6c5700, flag=Variable "flag" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2348 #15 0xffffffff803ffc05 in kern_lstat (td=0x0, path=0x0, pathseg=UIO_USERSPACE, sbp=0x0) at /usr/src/sys/kern/vfs_syscalls.c:2399 #16 0xffffffff803ffd0c in nstat (td=0x0, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:2448 #17 0xffffffff803ffdad in stat () at /usr/src/sys/kern/vfs_syscalls.c:2291 #18 0xffffff8085155ad0 in ?? () #19 0xffffffff803731de in runq_remove (rq=Variable "rq" is not available. ) at /usr/src/sys/kern/kern_switch.c:486 #20 0xffffffff805363a6 in syscall (frame=0xffffff8085155c80) at /usr/src/sys/amd64/amd64/trap.c:994 #21 0xffffffff8051c121 in Xfast_syscall32 () at /usr/src/sys/amd64/amd64/exception.S:385 How can I inspect what is causing this situation, presumably dead-lock? -- Andriy Gapon