From owner-freebsd-current@FreeBSD.ORG  Sat Sep  9 12:33:33 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C747316A403
	for <current@freebsd.org>; Sat,  9 Sep 2006 12:33:33 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 79FC043D46
	for <current@freebsd.org>; Sat,  9 Sep 2006 12:33:33 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 2459B46D1A;
	Sat,  9 Sep 2006 08:33:33 -0400 (EDT)
Date: Sat, 9 Sep 2006 13:33:33 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20060908072830.GA63071@peter.osted.lan>
Message-ID: <20060909132826.K84834@fledge.watson.org>
References: <20060908072830.GA63071@peter.osted.lan>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: current@freebsd.org
Subject: Re: Page fault in uipc_usrreq.c:997
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Sep 2006 12:33:33 -0000


On Fri, 8 Sep 2006, Peter Holm wrote:

> During boot of GENERIC HEAD from Sep 7 07:29 UTC I got this page
> fault:
>
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex unp r = 0 (0xc0a5520c) locked @
> kern/uipc_usrreq.c:987
> KDB: stack backtrace:
> kdb_backtrace(1,c410b000,c,c3f77a20,e43f7a28,...) at
> kdb_backtrace+0x29
> witness_warn(5,0,c0941302) at witness_warn+0x192
> trap(8,28,c4190028,c413a7a8,c4195690,...) at trap+0x108
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc06e01e6, esp = 0xe43f7a70, ebp = 0xe43f7bfc ---
> unp_connect(c41ce000,c3f797e0,c3f77a20,c0a5520c,0,...) at
> unp_connect+0x292
> uipc_connect(c41ce000,c3f797e0,c3f77a20) at uipc_connect+0x3e
> soconnect(c41ce000,c3f797e0,c3f77a20) at soconnect+0x4e
> kern_connect(c3f77a20,3,c3f797e0,c3f797e0,0,...) at kern_connect+0x76
> connect(c3f77a20,e43f7d04) at connect+0x30
> syscall(3b,3b,3b,1,8270000,...) at syscall+0x256
>
> http://people.freebsd.org/~pho/stress/log/cons207.html.
>
> The core file is toast and I missed a back trace of pid 678 :-(

This is likely one of the remaining race conditions in UNIX domain sockets 
having to do with simultaneous connect and close, which occur due to dropping 
locks for either a blocking name lookup or a recursion via the socket layer 
into the protocol a second time.  When the UNIX domain socket global lock is 
dropped and re-acquired, the UNIX domain socket code needs to re-evaluate its 
assumptions regarding any references it has to other UNIX domain sockets, 
which may have "gone away" while the lock was released.  Interestingly, many 
of these races also existed in 4.x and before, but they are more exposed with 
greater kernel parallelism.  I recently closed a spate of them, but it looks 
like a few remain.  In this case, the listen socket has possibly been closed 
(although possibly not) while sonewconn() is called.  It could be a reference 
needs to be added to so2 before dropping the unp lock.  I saw John's 
follow-up, but if ups/he don't have a fixed in a few days once I get back to 
the UK, I can investigate.  Send me a ping next week if I appear to forget 
:-).

Robert N M Watson
Computer Laboratory
University of Cambridge