From owner-freebsd-stable@FreeBSD.ORG Mon May 11 16:49:37 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C98A61065679 for ; Mon, 11 May 2009 16:49:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 96AAF8FC19 for ; Mon, 11 May 2009 16:49:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 46F6846B46; Mon, 11 May 2009 12:49:37 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 264B98A028; Mon, 11 May 2009 12:49:36 -0400 (EDT) From: John Baldwin To: pluknet Date: Mon, 11 May 2009 09:49:30 -0400 User-Agent: KMail/1.9.7 References: <200905010949.45927.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905110949.31142.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 11 May 2009 12:49:36 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=BAYES_00, DATE_IN_PAST_03_06, RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable@freebsd.org Subject: Re: lock up in 6.2 (procs massively stuck in Giant) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 May 2009 16:49:38 -0000 On Monday 04 May 2009 11:41:35 pm pluknet wrote: > 2009/5/1 John Baldwin : > > On Thursday 30 April 2009 2:36:34 am pluknet wrote: > >> Hi folks. > >> > >> Today I got a new locking issue. > >> This is the first time I got it, and it's merely reproduced. > >> > >> The box has lost both remote connection and local access. > >> No SIGINFO output on the local console even. > >> Jumping in ddb> shows the next: > >> > >> 1) first, this is a 8-way web server. No processes on runqueue except one > > httpd > >> (i.e. ps shows R in its state): > > > > You need to find who owns Giant and what that thread is doing. You can try > > using 'show lock Giant' as well as 'show lockchain 11568'. > > > > Hi, John! > > Just reproduced now on another box. > Hmm.. stack of the process owing Giant looks garbled. > > db> show lock Giant > class: sleep mutex > name: Giant > flags: {DEF, RECURSE} > state: {OWNED, CONTESTED} > owner: 0xd0d79320 (tid 102754, pid 34594, "httpd") > > db> show lockchain 34594 > thread 102754 (pid 34594, httpd) running on CPU 7 > db> show lockchain 102754 > thread 102754 (pid 34594, httpd) running on CPU 7 The thread is running, so we don't know what it's top of stack is and you can't a good stack trace in that case. None of your CPUs are idle, so I don't think you have any sort of deadlock. You might have a livelock. -- John Baldwin