From owner-freebsd-current@FreeBSD.ORG Sat Sep 9 15:25:50 2006 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E6DE816A403 for ; Sat, 9 Sep 2006 15:25:50 +0000 (UTC) (envelope-from pho@holm.cc) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.FreeBSD.org (Postfix) with SMTP id 035C043D69 for ; Sat, 9 Sep 2006 15:25:47 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 76594 invoked from network); 9 Sep 2006 15:25:46 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 9 Sep 2006 15:25:46 -0000 X-pair-Authenticated: 80.165.155.106 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.6/8.13.6) with ESMTP id k89FPjhY023030; Sat, 9 Sep 2006 17:25:45 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.6/8.13.6/Submit) id k89FPj5U023027; Sat, 9 Sep 2006 17:25:45 +0200 (CEST) (envelope-from pho) Date: Sat, 9 Sep 2006 17:25:45 +0200 From: Peter Holm To: Robert Watson Message-ID: <20060909152545.GA21958@peter.osted.lan> References: <20060908072830.GA63071@peter.osted.lan> <20060909132826.K84834@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060909132826.K84834@fledge.watson.org> User-Agent: Mutt/1.4.2.1i Cc: current@freebsd.org Subject: Re: Page fault in uipc_usrreq.c:997 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Sep 2006 15:25:51 -0000 On Sat, Sep 09, 2006 at 01:33:33PM +0100, Robert Watson wrote: > > On Fri, 8 Sep 2006, Peter Holm wrote: > > >During boot of GENERIC HEAD from Sep 7 07:29 UTC I got this page > >fault: > > > >Kernel page fault with the following non-sleepable locks held: > >exclusive sleep mutex unp r = 0 (0xc0a5520c) locked @ > >kern/uipc_usrreq.c:987 > >KDB: stack backtrace: > >kdb_backtrace(1,c410b000,c,c3f77a20,e43f7a28,...) at > >kdb_backtrace+0x29 > >witness_warn(5,0,c0941302) at witness_warn+0x192 > >trap(8,28,c4190028,c413a7a8,c4195690,...) at trap+0x108 > >calltrap() at calltrap+0x5 > >--- trap 0xc, eip = 0xc06e01e6, esp = 0xe43f7a70, ebp = 0xe43f7bfc --- > >unp_connect(c41ce000,c3f797e0,c3f77a20,c0a5520c,0,...) at > >unp_connect+0x292 > >uipc_connect(c41ce000,c3f797e0,c3f77a20) at uipc_connect+0x3e > >soconnect(c41ce000,c3f797e0,c3f77a20) at soconnect+0x4e > >kern_connect(c3f77a20,3,c3f797e0,c3f797e0,0,...) at kern_connect+0x76 > >connect(c3f77a20,e43f7d04) at connect+0x30 > >syscall(3b,3b,3b,1,8270000,...) at syscall+0x256 > > > >http://people.freebsd.org/~pho/stress/log/cons207.html. > > > >The core file is toast and I missed a back trace of pid 678 :-( > > This is likely one of the remaining race conditions in UNIX domain sockets > having to do with simultaneous connect and close, which occur due to > dropping locks for either a blocking name lookup or a recursion via the > socket layer into the protocol a second time. When the UNIX domain socket > global lock is dropped and re-acquired, the UNIX domain socket code needs > to re-evaluate its assumptions regarding any references it has to other > UNIX domain sockets, which may have "gone away" while the lock was > released. Interestingly, many of these races also existed in 4.x and > before, but they are more exposed with greater kernel parallelism. I > recently closed a spate of them, but it looks like a few remain. In this > case, the listen socket has possibly been closed (although possibly not) > while sonewconn() is called. It could be a reference needs to be added to > so2 before dropping the unp lock. I saw John's follow-up, but if ups/he > don't have a fixed in a few days once I get back to the UK, I can > investigate. Send me a ping next week if I appear to forget :-). > OK. I'll keep this panic on my list until it's fixed. - Peter > Robert N M Watson > Computer Laboratory > University of Cambridge