From owner-freebsd-bugs@FreeBSD.ORG Wed Feb 1 23:50:09 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9369106566C for ; Wed, 1 Feb 2012 23:50:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8DA5B8FC15 for ; Wed, 1 Feb 2012 23:50:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q11No9nH035510 for ; Wed, 1 Feb 2012 23:50:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q11No9C3035509; Wed, 1 Feb 2012 23:50:09 GMT (envelope-from gnats) Date: Wed, 1 Feb 2012 23:50:09 GMT Message-Id: <201202012350.q11No9C3035509@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Jilles Tjoelker Cc: Subject: Re: bin/164526: kill(1) can not kill process despite on -KILL X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jilles Tjoelker List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2012 23:50:09 -0000 The following reply was made to PR bin/164526; it has been noted by GNATS. From: Jilles Tjoelker To: =?utf-8?B?0JrQvtC90YzQutC+0LIg0JXQstCz0LXQvdC40Lk=?= Cc: bug-followup@FreeBSD.org, freeradius-users@lists.freeradius.org, firebird-devel@lists.sourceforge.net Subject: Re: bin/164526: kill(1) can not kill process despite on -KILL Date: Thu, 2 Feb 2012 00:46:47 +0100 On Thu, Feb 02, 2012 at 12:16:39AM +0200, Коньков Евгений wrote: > repeated again: > bug is repeateable: > 1. radiusd + mod_perl + example.pl(it is connects to FireBird) + > FireBIrd > 2. restart firebird > 3. try to restart radiusd > 4. process in fall into STOP state > # ps awx | grep radi > 9438 ?? TLs 5:10.12 /usr/local/sbin/radiusd > 27603 2 S+ 0:00.00 grep radi > # procstat -k 9438 > PID TID COMM TDNAME KSTACK > 9438 100080 radiusd - mi_switch sleepq_switch sleepq_wait _sx_xlock_hard _sx_xlock _vm_map_lock_upgrade vm_map_lookup vm_fault_hold vm_fault trap_pfault trap calltrap > 9438 100195 radiusd - mi_switch sleepq_switch sleepq_wait __lockmgr_args ffs_lock VOP_LOCK1_APV _vn_lock vm_object_deallocate unlock_and_deallocate vm_fault_hold vm_fault trap_pfault trap calltrap > 9438 101144 radiusd - mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast > # ps wHl9438 > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 133 9438 1 0 20 0 351124 322000 user m TLs ?? 0:03.65 /usr/local/sbin/radiusd > 133 9438 1 0 20 0 351124 322000 ufs TLs ?? 0:00.00 /usr/local/sbin/radiusd > 133 9438 1 0 20 0 351124 322000 - TLs ?? 0:05.28 /usr/local/sbin/radiusd > if I can supply another usefull debug info, answer as fast as you can, I can > not wait too long. Thank you. OK, this looks like it may be useful for someone who knows more about the VM system than I do. It is very likely a FreeBSD kernel bug though, so building freeradius and/or firebird with debug information is unlikely to be useful (apart from perturbing a race condition, if the problem is related to a race condition). My analysis: thread 101144 is attempting to shut down the process in response to a signal, but needs to wait for 100080 and 100195 to finish page fault processing. For thread 100195, page fault processing resulted in deallocating a VM object based on some sort of file, and it is blocked waiting on the vnode lock for the file. It may or may not hold a lock on a user map. Thread 100080 needs to lock a user map to continue processing (this means the fault is either a copy-on-write fault or the first write to anonymous memory). It seems that 100080 is not holding the vnode lock that 100195 needs. If you have DDB (kernel debugger) and witness compiled in, the DDB command show locks will show who owns these locks. This is probably The output of procstat -kka may be useful (like the previous procstat command but for all threads in the system and with offsets from each function). The output of procstat -v 9438 is the memory mappings of the process. It could be that this command gets stuck because of the locks. -- Jilles Tjoelker