Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Feb 2012 23:50:09 GMT
From:      Jilles Tjoelker <jilles@stack.nl>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: bin/164526: kill(1) can not kill process despite on -KILL
Message-ID:  <201202012350.q11No9C3035509@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/164526; it has been noted by GNATS.

From: Jilles Tjoelker <jilles@stack.nl>
To: =?utf-8?B?0JrQvtC90YzQutC+0LIg0JXQstCz0LXQvdC40Lk=?= <kes-kes@yandex.ru>
Cc: bug-followup@FreeBSD.org, freeradius-users@lists.freeradius.org,
	firebird-devel@lists.sourceforge.net
Subject: Re: bin/164526: kill(1) can not kill process despite on -KILL
Date: Thu, 2 Feb 2012 00:46:47 +0100

 On Thu, Feb 02, 2012 at 12:16:39AM +0200, Коньков Евгений wrote:
 > repeated again:
 > bug is repeateable:
 > 1. radiusd + mod_perl + example.pl(it is connects to FireBird) +
 > FireBIrd
 > 2. restart firebird
 > 3. try to restart radiusd
 > 4. process in fall into STOP state
 
 > # ps awx | grep radi
 >  9438  ??  TLs     5:10.12 /usr/local/sbin/radiusd
 > 27603   2  S+      0:00.00 grep radi
 > # procstat -k 9438
 >   PID    TID COMM             TDNAME           KSTACK
 >  9438 100080 radiusd          -                mi_switch sleepq_switch sleepq_wait _sx_xlock_hard _sx_xlock _vm_map_lock_upgrade vm_map_lookup vm_fault_hold vm_fault trap_pfault trap calltrap
 >  9438 100195 radiusd          -                mi_switch sleepq_switch sleepq_wait __lockmgr_args ffs_lock VOP_LOCK1_APV _vn_lock vm_object_deallocate unlock_and_deallocate vm_fault_hold vm_fault trap_pfault trap calltrap
 >  9438 101144 radiusd          -                mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
 > # ps wHl9438
 >   UID   PID  PPID CPU PRI NI    VSZ    RSS MWCHAN STAT  TT     TIME COMMAND
 >   133  9438     1   0  20  0 351124 322000 user m TLs   ??  0:03.65 /usr/local/sbin/radiusd
 >   133  9438     1   0  20  0 351124 322000 ufs    TLs   ??  0:00.00 /usr/local/sbin/radiusd
 >   133  9438     1   0  20  0 351124 322000 -      TLs   ??  0:05.28 /usr/local/sbin/radiusd
 
 > if I can supply another usefull debug info, answer as fast as you can, I can
 > not wait too long. Thank you.
 
 OK, this looks like it may be useful for someone who knows more about
 the VM system than I do. It is very likely a FreeBSD kernel bug though,
 so building freeradius and/or firebird with debug information is
 unlikely to be useful (apart from perturbing a race condition, if the
 problem is related to a race condition).
 
 My analysis: thread 101144 is attempting to shut down the process in
 response to a signal, but needs to wait for 100080 and 100195 to finish
 page fault processing. For thread 100195, page fault processing resulted
 in deallocating a VM object based on some sort of file, and it is
 blocked waiting on the vnode lock for the file. It may or may not hold a
 lock on a user map. Thread 100080 needs to lock a user map to continue
 processing (this means the fault is either a copy-on-write fault or the
 first write to anonymous memory). It seems that 100080 is not holding
 the vnode lock that 100195 needs.
 
 If you have DDB (kernel debugger) and witness compiled in, the DDB
 command
   show locks
 will show who owns these locks. This is probably
 
 The output of
   procstat -kka
 may be useful (like the previous procstat command but for all threads in
 the system and with offsets from each function).
 
 The output of
   procstat -v 9438
 is the memory mappings of the process. It could be that this command
 gets stuck because of the locks.
 
 -- 
 Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201202012350.q11No9C3035509>