From owner-freebsd-current@FreeBSD.ORG Tue Oct 25 10:07:02 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C62A216A41F; Tue, 25 Oct 2005 10:07:02 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B5BB43D48; Tue, 25 Oct 2005 10:07:02 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 5C07146B1A; Tue, 25 Oct 2005 06:07:01 -0400 (EDT) Date: Tue, 25 Oct 2005 11:07:01 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Philip Kizer In-Reply-To: Message-ID: <20051025110453.L6720@fledge.watson.org> References: <200510191623.j9JGNSfr007356@magus.nostrum.com> <20051019175020.S60849@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org Subject: Re: Problem remains with FreeBSD 6.0-RC1 as seen in RELENG_5 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2005 10:07:02 -0000 On Tue, 25 Oct 2005, Philip Kizer wrote: > On Oct 19, 2005, at 11:54, Robert Watson wrote: >> This appears to be a problem with file descriptor passing and garbage >> collection. I've seen one report of a lock order reversal along these >> lines, but it was not believed to be symptomatic of an actual hang, just an >> architectural issue. This could be a sign that we need to address the >> source of the reversal, although it sounds like you don't get a reversal >> warning? >> >> Could I have you try the following DDB commands also: >> >> show alllocks >> traceall > > OK, this is a new hang with almost all I had before (looks like I did > forgot the "print sysctllock", will I need to be sure and include it for > a complete diagnosis?) and those two as well: > > http://www.nostrum.com/hang/hang.trace-2005-10-25-0.txt > > [Or would you prefer I include it directly rather than sparing the list > the output?] URL is fine, and useful, thanks! There are a couple of possible sources, so if this is reproduceable and you don't mind trying some diagnostic patches, I've attached a first one below. This checks for the case where the looping in the unp_gc() routine becomes unbounded due to a possible lack of synchronization in the handling of marking and counting of marking. It needs INVARIANTS to be compiled in to work; if it fires, this will suggest an avenue to explore. Robert N M Watson Index: uipc_usrreq.c =================================================================== RCS file: /data/fbsd-cvs/ncvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.156 diff -u -r1.156 uipc_usrreq.c --- uipc_usrreq.c 23 Sep 2005 12:41:06 -0000 1.156 +++ uipc_usrreq.c 25 Oct 2005 10:04:36 -0000 @@ -1613,6 +1613,7 @@ LIST_FOREACH(fp, &filehead, f_list) fp->f_gcflag &= ~(FMARK|FDEFER); do { + KASSERT(unp_defer >= 0, ("unp_gc: unp_defer %d", unp_defer)); LIST_FOREACH(fp, &filehead, f_list) { FILE_LOCK(fp); /*