Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jul 2002 18:14:31 -0700 (PDT)
From:      Julian Elischer <julian@elischer.org>
To:        David Xu <bsddiy@yahoo.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: signal handling bug in KSE MIII
Message-ID:  <Pine.BSF.4.21.0207221735120.5392-100000@InterJet.elischer.org>
In-Reply-To: <20020722000211.55864.qmail@web20902.mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help



On Sun, 21 Jul 2002, David Xu wrote:

> 
> --- David Xu <bsddiy@yahoo.com> wrote:
> > I found signal handling is still broken in CURRENT source.
> > the following program demostrates the bug is still in kernel:
> > 
> > #include <stdio.h>
> > #include <signal.h>
> > 
> > void handler(int sig)
> > {
> >      signal(SIGTSTP, SIG_DFL);
> >      kill(getpid(), SIGTSTP);
> > }
> > 
> > int main()
> > {
> >     char buf[64];
> > 
> >     signal(SIGTSTP, handler);
> >     kill(getpid(), SIGTSTP);
> >     printf("input foo:");
> >     scanf("%63s", buf);
> >     printf("you input:%s\n", buf);
> >     return 0;
> > }
> > 
> > when I press CTRL+Z, the program does not suspend and directly exits.
> > ktrace indicates that the program directly exits in kernel without 
> > calling exit() from program.
> > I found SA_STOP handling is disabled in issignal(), delayed SIGTSTP
> > is forgotten by kernel, this is the reason why SIGTSTP signal handling
> > is broken. at least, one program is affected --- ftp, run ftp
> > client program, when 'ftp>' prompt appears, pressing CTRL+Z, causes ftp 
> > to exit and do not suspend. patch:

Ok, I see it.. now there is a problem when we start to consider a multi
threaded process. This is why I defined out the code  (and forgot that
fact in teh 6 months following).
The problem is that if you send a suspend signal (STOP) to a process, all
the threads must be accounted for before we can say that the 
process is 'stopped'. There are three basic sources of STOP conditions 
being asserted, one of which is new.
1/ Signals 
2/ Ptrace 
3/ a single-threading funnel.

The third is new and is also where the problem is.

A single threading funnel is needed when a process does some system calls.
for example, if it calls fork(). The thread calling fork() needs to know
that all the other threads are in a safe suspended state before it can
proceed into  fork1(). To do this, it suspends itself, and sets a
STOPPED condition on the process. When the last thread is suspended, then
the initial thread is allowed to proceed, safe in teh knowledge that 
no other thread is doing anything in userland that would produce an
inconsistant state for the child. (or in the kernel for
that matter). Once the fork has been completed, the STOPPED
condition is lefted and the other threads are allowed to continue.
Teh question is whether it is safe to suspend the other threads
at palces other than the user boundary.. e.g. most of the calls
to issignal() are in msleep and the condvar code.

The picture gets more messy when we realise that SOMETIMES the funnel
doesn't want the other threads to be suspended, but rather, to EXIT.
this happens at the moment at the start of exit1(), and in exec().

In both of these cases, threads other than that doing the exit() or exec() 
calls must be trapped in a safe place and aborted. The only safe place 
to do this is at the user boundary. You have to force any such threads
that may be about to sleep, to abort and return to the user boundary.

The thread doing the exit() can not proceed until
all other threads have been accounted for. Because these two cases are
similar, I combined them into a single facility. Thread_single()
and friends.. (it's a posibility that this was a mistake but 
I'm not convinced of that yet. Basically if the STOPPED_SNGLE
flag is set in the process, any thread trying to go to userland 
will drop into the funnel code. If it gets there and discovers that the
SNGLE_EXIT flag is also set it will abort rather than suspending.

The difficulty is that issignal is catching suspensions and stopping the 
threads at the point of the sleep(), in teh middle of whatever operation
it is doing. As a result of this a suspended thread cannot proceed to 
the user boundary, and we are deadlocked as teh exiting process is 
waiting for all the other threads to exit and the STOPPED
threads a re waiting for the STOP state to be lifted.

My theory was that  we need to allow the restarting  thread
(from the msleep() to contiue on to the user boundary and stop it THERE.)
rather than stopping it in the middle of whatever calculations it was
doing, holding whatever resources it holds (vnodes etc.). I guess that if
this is the case then we need to do the extra code only in issignal
at the user boundary. Possibly the answer is to allow the  code again
but to reverse the calls to issignal() and thread_suspend_check()
in msleep() below..

 538                 sig = cursig(td);
 539                 if (sig == 0) {
 540                         if (thread_suspend_check(1)) {
 541                                 sig = SIGSTOP;
 542                         }
 543                 }

so that would make it:
if (thread_suspend_check(1)) {
	sig = SIGSTOP;
} else {
	sig = cursig(td);
}

then at the user boundary issignal could do the signalling of the parent
(ONLY IF THIS IS THE LAST THREAD TO SUSPEND) and the
thread_suspend_check(0) at teh user boundary could be entgrusted with
stopping the thread (puting it on the 'suspended' queue.)

For now we can put the original code back because we don't want to break
the non-threaded case, but it will break the threaded case.
We need to go over the stop/sleep code anyway.
It needs some rewritng.
(ANyone seeing a cleaner solution is welcome to suggest it.)


> > 
> > --- kern_sig.c.old	Sun Jul 21 15:38:00 2002
> > +++ kern_sig.c	Sun Jul 21 16:31:02 2002
> > @@ -1657,7 +1657,7 @@
> >  #endif
> >  				break;		/* == ignore */
> >  			}
> > -#if 0
> > +
> >  			/*
> >  			 * If there is a pending stop signal to process
> >  			 * with default action, stop here,
> > @@ -1679,16 +1679,10 @@
> >  				PROC_UNLOCK(p->p_pptr);
> >  				mtx_lock_spin(&sched_lock);
> >  				stop(p);
> > -				PROC_UNLOCK(p);
> > -				DROP_GIANT();
> > -				p->p_stats->p_ru.ru_nivcsw++;
> > -				mi_switch();
> >  				mtx_unlock_spin(&sched_lock);
> > -				PICKUP_GIANT();
> > -				PROC_LOCK(p);
> >  				break;
> >  			} else
> > -#endif
> > +
> >  			     if (prop & SA_IGNORE) {
> >  				/*
> >  				 * Except for SIGCONT, shouldn't get here.
> > 
> > David Xu
> > 
> 
> OOPS, the extra kill() calling in main() should be removed 
> in demo program, correct source code is:
>  
> #include <stdio.h>
> #include <signal.h>
> 
> void handler(int sig)
> {
>      signal(SIGTSTP, SIG_DFL);
>      kill(getpid(), SIGTSTP);
> }
> 
> int main()
> {
>     char buf[64];
> 
>     signal(SIGTSTP, handler);
>     printf("input foo:");
>     scanf("%63s", buf);
>     printf("you input:%s\n", buf);
>     return 0;
> }
> 
> David Xu
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - Feel better, live better
> http://health.yahoo.com
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0207221735120.5392-100000>