Date: Wed, 11 Jul 2007 20:56:26 -0700 From: Mark Peek <mp@FreeBSD.org> To: Doug White <dwhite@gumbysoft.com> Cc: tcsh-bugs@mx.gw.com, current@freebsd.org Subject: Re: tcsh backtick hang info Message-ID: <4695A66A.8030903@FreeBSD.org> In-Reply-To: <20070711191310.M90716@carver.gumbysoft.com> References: <20070711191310.M90716@carver.gumbysoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/11/07 7:28 PM, Doug White wrote: > (note: freebsd-current@freebsd.org and tcsh-bugs@mx.gw.com are in the > To: on this message. Restrict replies accordingly.) > > Hey folks, > > I spent several hours today pawing through the tcsh source in an effort > to figure out whats going on with tcsh hangs with backticked commands in > tcsh 6.15.00. > > The canonical example is something like: > > kill `ps ax | grep foo | awk '{print $1}'` > > where a builtin gets its arguments from a backticked expression composed > of non-builtins. > > tcsh 6.15.00 introduced a new reference-counted signal management > facility where, instead of manipulating the signal mask directly, > functions increment a variable that is polled to see whether to perform > the action associated with SIGINT, SIGCHLD, SIGALRM, or SIGHUP. The > signal handler function itself sets a pending flag for each named signal > and returns, so only a few instructions are executed in signal context. > At some future point the pending flags are polled by a call to > handle_pending_signals(), usually in a loop where the shell goes to > sleep waiting for an external action to occur. When the function no > longer needs the signal to be blocked it decrements the count via a > stack of cleanup handlers. When a count reaches zero then a poll is > immediately triggered. > > If the disabled count is >1 for a signal when a handle_pending_signals() > poll occurs, then the signal is not "handled". > > In the case above, the disabled count for SIGCHLD is 1 when SIGCHLD > fires from the completion of the backticked commands. The sigsuspend() > in pjwait() is correctly woken up by the kernel but, because the > disabled count is 1, the shell goes back into sigsuspend() and appears > to hang. > > In this case it appears to be an improperly placed bump to the SIGCHLD > disable count that is held over a call to pjwait(). I haven't yet > determined the call stack (and gdb cannot debug tcsh at the moment) so I > need to continue instrumenting the code to figure out what higher level > function is disabling SIGCHLD and then calling something that eventually > calls pjwait(). > There appears to be two different issues. One is with the builtin kill and the other is the gdb issue. I sent off a tentative patch to the reporter of the builtin kill issue and am awaiting onfirmation. The patch is here: http://people.freebsd.org/~mp/tcsh_kill.patch The gdb issue, much to my dismay, is still alluding my debugging skill given the interaction with gdb and issues with actually debugging what is happening. Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4695A66A.8030903>