From owner-freebsd-current@FreeBSD.ORG Thu Jul 12 04:23:30 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5D15F16A469 for ; Thu, 12 Jul 2007 04:23:30 +0000 (UTC) (envelope-from mp@FreeBSD.org) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id 0E9AB13C46E for ; Thu, 12 Jul 2007 04:23:29 +0000 (UTC) (envelope-from mp@FreeBSD.org) Received: (qmail 75081 invoked by uid 0); 12 Jul 2007 03:56:48 -0000 Received: from 24.4.239.7 (HELO mp.peek.org) (24.4.239.7) by relay02.pair.com with SMTP; 12 Jul 2007 03:56:48 -0000 X-pair-Authenticated: 24.4.239.7 Message-ID: <4695A66A.8030903@FreeBSD.org> Date: Wed, 11 Jul 2007 20:56:26 -0700 From: Mark Peek User-Agent: Thunderbird 2.0.0.0pre (Macintosh/20070419) MIME-Version: 1.0 To: Doug White References: <20070711191310.M90716@carver.gumbysoft.com> In-Reply-To: <20070711191310.M90716@carver.gumbysoft.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: tcsh-bugs@mx.gw.com, current@freebsd.org Subject: Re: tcsh backtick hang info X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2007 04:23:30 -0000 On 7/11/07 7:28 PM, Doug White wrote: > (note: freebsd-current@freebsd.org and tcsh-bugs@mx.gw.com are in the > To: on this message. Restrict replies accordingly.) > > Hey folks, > > I spent several hours today pawing through the tcsh source in an effort > to figure out whats going on with tcsh hangs with backticked commands in > tcsh 6.15.00. > > The canonical example is something like: > > kill `ps ax | grep foo | awk '{print $1}'` > > where a builtin gets its arguments from a backticked expression composed > of non-builtins. > > tcsh 6.15.00 introduced a new reference-counted signal management > facility where, instead of manipulating the signal mask directly, > functions increment a variable that is polled to see whether to perform > the action associated with SIGINT, SIGCHLD, SIGALRM, or SIGHUP. The > signal handler function itself sets a pending flag for each named signal > and returns, so only a few instructions are executed in signal context. > At some future point the pending flags are polled by a call to > handle_pending_signals(), usually in a loop where the shell goes to > sleep waiting for an external action to occur. When the function no > longer needs the signal to be blocked it decrements the count via a > stack of cleanup handlers. When a count reaches zero then a poll is > immediately triggered. > > If the disabled count is >1 for a signal when a handle_pending_signals() > poll occurs, then the signal is not "handled". > > In the case above, the disabled count for SIGCHLD is 1 when SIGCHLD > fires from the completion of the backticked commands. The sigsuspend() > in pjwait() is correctly woken up by the kernel but, because the > disabled count is 1, the shell goes back into sigsuspend() and appears > to hang. > > In this case it appears to be an improperly placed bump to the SIGCHLD > disable count that is held over a call to pjwait(). I haven't yet > determined the call stack (and gdb cannot debug tcsh at the moment) so I > need to continue instrumenting the code to figure out what higher level > function is disabling SIGCHLD and then calling something that eventually > calls pjwait(). > There appears to be two different issues. One is with the builtin kill and the other is the gdb issue. I sent off a tentative patch to the reporter of the builtin kill issue and am awaiting onfirmation. The patch is here: http://people.freebsd.org/~mp/tcsh_kill.patch The gdb issue, much to my dismay, is still alluding my debugging skill given the interaction with gdb and issues with actually debugging what is happening. Mark