Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2007 19:28:55 -0700 (PDT)
From:      Doug White <dwhite@gumbysoft.com>
To:        current@freebsd.org, tcsh-bugs@mx.gw.com
Subject:   tcsh backtick hang info
Message-ID:  <20070711191310.M90716@carver.gumbysoft.com>

next in thread | raw e-mail | index | archive | help
(note: freebsd-current@freebsd.org and tcsh-bugs@mx.gw.com are in the To: 
on this message. Restrict replies accordingly.)

Hey folks,

I spent several hours today pawing through the tcsh source in an effort to 
figure out whats going on with tcsh hangs with backticked commands in tcsh 
6.15.00.

The canonical example is something like:

kill `ps ax | grep foo | awk '{print $1}'`

where a builtin gets its arguments from a backticked expression composed 
of non-builtins.

tcsh 6.15.00 introduced a new reference-counted signal management facility 
where, instead of manipulating the signal mask directly, functions 
increment a variable that is polled to see whether to perform the action 
associated with SIGINT, SIGCHLD, SIGALRM, or SIGHUP.  The signal handler 
function itself sets a pending flag for each named signal and returns, so 
only a few instructions are executed in signal context.  At some future 
point the pending flags are polled by a call to handle_pending_signals(), 
usually in a loop where the shell goes to sleep waiting for an external 
action to occur. When the function no longer needs the signal to be 
blocked it decrements the count via a stack of cleanup handlers. When a 
count reaches zero then a poll is immediately triggered.

If the disabled count is >1 for a signal when a handle_pending_signals() 
poll occurs, then the signal is not "handled".

In the case above, the disabled count for SIGCHLD is 1 when SIGCHLD fires 
from the completion of the backticked commands. The sigsuspend() in 
pjwait() is correctly woken up by the kernel but, because the disabled 
count is 1, the shell goes back into sigsuspend() and appears to hang.

In this case it appears to be an improperly placed bump to the SIGCHLD 
disable count that is held over a call to pjwait(). I haven't yet 
determined the call stack (and gdb cannot debug tcsh at the moment) so I 
need to continue instrumenting the code to figure out what higher level 
function is disabling SIGCHLD and then calling something that eventually 
calls pjwait().

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070711191310.M90716>