Date: Sun, 13 Mar 2011 21:41:21 -0700 From: Ravi Murty <ravi.murty@gmail.com> To: freebsd-hackers@freebsd.org Subject: SIGSTOP and SIGKILL Message-ID: <AANLkTi=70WWQLz7iT413tGXq_dhp_jBX7hu0qHWT=ar6@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi everybody, I'm using FreeBSD 8.0 and I seem to have a race condition that is fairly reproducible. Let me try and describe it. The basic idea is that we use SIGSTOP and SIGCONT to stop and restart threads of a process - call it p1. A caller (call it c1) SIGSTOPs and SIGCONTs p1 until another caller (call it c2) decides to come along and kill the process. Both callers grab proc_lock for p1 and use pfind(...) to find the process before subjecting p1 to any of these signals. What I see is that SIGKILL is somehow ignored in favor for SIGSTOP and process (and all of its threads somehow end up suspended). As a side note, we changed our implementation to "post" SIGKILL to all threads of p1 because of another race we discovered. In this case the thread selected by psignal/tdsignal happened to be in thr_exit() on its way to dying. Becuse it was still on the list of available threads for the process, it was picked (FIRST_TD_IN_PROC) but because it was in thr_exit it dies taking SIGKILL with it. What I see in this new race is the following. We post SIGKILL on every thread of the process and c2 leaves releasing p2's proc_lock. As each thread returns to ring3 via the trap handler it sees that it has a signal to deal with and calls cursig and postsig. In the code, postsig eventually calls sigexit (default behavior) which via exit1 calls thread_suspend_check causing threads to kill themselves as long as the first thread that is here calls thread_single(SINGLE_EXIT). In our case, the process (which is still on the global all_proc list) is subjected to SIGSTOP which sets the P_STOPPED_SIG flag to p1. As each thread makes its way through thread_suspend_check it suspends itself becuase P_SHOULDSTOP ends up being true. In the end I end up with a process whose threads have taken SIGKILL (I can dump each threads state and look at its siglist to see no signals) but the process hasn't died. This seems odd. It would seem that any signals posted after the process receives a SIGKILL should be ignore but how do we detect that specially after SIGKILL is cleared from the siglist because it is in the middle of taking the signal. Alternatively if the signal being taken is SIGKILL the kernel needs to avoid saying "I'll stop the process now because I've been asked to". Any good solutions to this problem? Thanks Ravi Murty
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=70WWQLz7iT413tGXq_dhp_jBX7hu0qHWT=ar6>