From owner-freebsd-current Tue Jul 23 16:46:15 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8773337B400 for ; Tue, 23 Jul 2002 16:46:05 -0700 (PDT) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id ABFED43E3B for ; Tue, 23 Jul 2002 16:44:52 -0700 (PDT) (envelope-from julian@elischer.org) Received: from InterJet.elischer.org ([12.232.206.8]) by sccrmhc02.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020723234440.FDEV19639.sccrmhc02.attbi.com@InterJet.elischer.org>; Tue, 23 Jul 2002 23:44:40 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id SAA06228; Mon, 22 Jul 2002 18:14:32 -0700 (PDT) Date: Mon, 22 Jul 2002 18:14:31 -0700 (PDT) From: Julian Elischer To: David Xu Cc: freebsd-current@freebsd.org Subject: Re: signal handling bug in KSE MIII In-Reply-To: <20020722000211.55864.qmail@web20902.mail.yahoo.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 21 Jul 2002, David Xu wrote: > > --- David Xu wrote: > > I found signal handling is still broken in CURRENT source. > > the following program demostrates the bug is still in kernel: > > > > #include > > #include > > > > void handler(int sig) > > { > > signal(SIGTSTP, SIG_DFL); > > kill(getpid(), SIGTSTP); > > } > > > > int main() > > { > > char buf[64]; > > > > signal(SIGTSTP, handler); > > kill(getpid(), SIGTSTP); > > printf("input foo:"); > > scanf("%63s", buf); > > printf("you input:%s\n", buf); > > return 0; > > } > > > > when I press CTRL+Z, the program does not suspend and directly exits. > > ktrace indicates that the program directly exits in kernel without > > calling exit() from program. > > I found SA_STOP handling is disabled in issignal(), delayed SIGTSTP > > is forgotten by kernel, this is the reason why SIGTSTP signal handling > > is broken. at least, one program is affected --- ftp, run ftp > > client program, when 'ftp>' prompt appears, pressing CTRL+Z, causes ftp > > to exit and do not suspend. patch: Ok, I see it.. now there is a problem when we start to consider a multi threaded process. This is why I defined out the code (and forgot that fact in teh 6 months following). The problem is that if you send a suspend signal (STOP) to a process, all the threads must be accounted for before we can say that the process is 'stopped'. There are three basic sources of STOP conditions being asserted, one of which is new. 1/ Signals 2/ Ptrace 3/ a single-threading funnel. The third is new and is also where the problem is. A single threading funnel is needed when a process does some system calls. for example, if it calls fork(). The thread calling fork() needs to know that all the other threads are in a safe suspended state before it can proceed into fork1(). To do this, it suspends itself, and sets a STOPPED condition on the process. When the last thread is suspended, then the initial thread is allowed to proceed, safe in teh knowledge that no other thread is doing anything in userland that would produce an inconsistant state for the child. (or in the kernel for that matter). Once the fork has been completed, the STOPPED condition is lefted and the other threads are allowed to continue. Teh question is whether it is safe to suspend the other threads at palces other than the user boundary.. e.g. most of the calls to issignal() are in msleep and the condvar code. The picture gets more messy when we realise that SOMETIMES the funnel doesn't want the other threads to be suspended, but rather, to EXIT. this happens at the moment at the start of exit1(), and in exec(). In both of these cases, threads other than that doing the exit() or exec() calls must be trapped in a safe place and aborted. The only safe place to do this is at the user boundary. You have to force any such threads that may be about to sleep, to abort and return to the user boundary. The thread doing the exit() can not proceed until all other threads have been accounted for. Because these two cases are similar, I combined them into a single facility. Thread_single() and friends.. (it's a posibility that this was a mistake but I'm not convinced of that yet. Basically if the STOPPED_SNGLE flag is set in the process, any thread trying to go to userland will drop into the funnel code. If it gets there and discovers that the SNGLE_EXIT flag is also set it will abort rather than suspending. The difficulty is that issignal is catching suspensions and stopping the threads at the point of the sleep(), in teh middle of whatever operation it is doing. As a result of this a suspended thread cannot proceed to the user boundary, and we are deadlocked as teh exiting process is waiting for all the other threads to exit and the STOPPED threads a re waiting for the STOP state to be lifted. My theory was that we need to allow the restarting thread (from the msleep() to contiue on to the user boundary and stop it THERE.) rather than stopping it in the middle of whatever calculations it was doing, holding whatever resources it holds (vnodes etc.). I guess that if this is the case then we need to do the extra code only in issignal at the user boundary. Possibly the answer is to allow the code again but to reverse the calls to issignal() and thread_suspend_check() in msleep() below.. 538 sig = cursig(td); 539 if (sig == 0) { 540 if (thread_suspend_check(1)) { 541 sig = SIGSTOP; 542 } 543 } so that would make it: if (thread_suspend_check(1)) { sig = SIGSTOP; } else { sig = cursig(td); } then at the user boundary issignal could do the signalling of the parent (ONLY IF THIS IS THE LAST THREAD TO SUSPEND) and the thread_suspend_check(0) at teh user boundary could be entgrusted with stopping the thread (puting it on the 'suspended' queue.) For now we can put the original code back because we don't want to break the non-threaded case, but it will break the threaded case. We need to go over the stop/sleep code anyway. It needs some rewritng. (ANyone seeing a cleaner solution is welcome to suggest it.) > > > > --- kern_sig.c.old Sun Jul 21 15:38:00 2002 > > +++ kern_sig.c Sun Jul 21 16:31:02 2002 > > @@ -1657,7 +1657,7 @@ > > #endif > > break; /* == ignore */ > > } > > -#if 0 > > + > > /* > > * If there is a pending stop signal to process > > * with default action, stop here, > > @@ -1679,16 +1679,10 @@ > > PROC_UNLOCK(p->p_pptr); > > mtx_lock_spin(&sched_lock); > > stop(p); > > - PROC_UNLOCK(p); > > - DROP_GIANT(); > > - p->p_stats->p_ru.ru_nivcsw++; > > - mi_switch(); > > mtx_unlock_spin(&sched_lock); > > - PICKUP_GIANT(); > > - PROC_LOCK(p); > > break; > > } else > > -#endif > > + > > if (prop & SA_IGNORE) { > > /* > > * Except for SIGCONT, shouldn't get here. > > > > David Xu > > > > OOPS, the extra kill() calling in main() should be removed > in demo program, correct source code is: > > #include > #include > > void handler(int sig) > { > signal(SIGTSTP, SIG_DFL); > kill(getpid(), SIGTSTP); > } > > int main() > { > char buf[64]; > > signal(SIGTSTP, handler); > printf("input foo:"); > scanf("%63s", buf); > printf("you input:%s\n", buf); > return 0; > } > > David Xu > > > > __________________________________________________ > Do You Yahoo!? > Yahoo! Health - Feel better, live better > http://health.yahoo.com > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message