Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Sep 2006 13:33:33 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Peter Holm <peter@holm.cc>
Cc:        current@freebsd.org
Subject:   Re: Page fault in uipc_usrreq.c:997
Message-ID:  <20060909132826.K84834@fledge.watson.org>
In-Reply-To: <20060908072830.GA63071@peter.osted.lan>
References:  <20060908072830.GA63071@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 8 Sep 2006, Peter Holm wrote:

> During boot of GENERIC HEAD from Sep 7 07:29 UTC I got this page
> fault:
>
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex unp r = 0 (0xc0a5520c) locked @
> kern/uipc_usrreq.c:987
> KDB: stack backtrace:
> kdb_backtrace(1,c410b000,c,c3f77a20,e43f7a28,...) at
> kdb_backtrace+0x29
> witness_warn(5,0,c0941302) at witness_warn+0x192
> trap(8,28,c4190028,c413a7a8,c4195690,...) at trap+0x108
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc06e01e6, esp = 0xe43f7a70, ebp = 0xe43f7bfc ---
> unp_connect(c41ce000,c3f797e0,c3f77a20,c0a5520c,0,...) at
> unp_connect+0x292
> uipc_connect(c41ce000,c3f797e0,c3f77a20) at uipc_connect+0x3e
> soconnect(c41ce000,c3f797e0,c3f77a20) at soconnect+0x4e
> kern_connect(c3f77a20,3,c3f797e0,c3f797e0,0,...) at kern_connect+0x76
> connect(c3f77a20,e43f7d04) at connect+0x30
> syscall(3b,3b,3b,1,8270000,...) at syscall+0x256
>
> http://people.freebsd.org/~pho/stress/log/cons207.html.
>
> The core file is toast and I missed a back trace of pid 678 :-(

This is likely one of the remaining race conditions in UNIX domain sockets 
having to do with simultaneous connect and close, which occur due to dropping 
locks for either a blocking name lookup or a recursion via the socket layer 
into the protocol a second time.  When the UNIX domain socket global lock is 
dropped and re-acquired, the UNIX domain socket code needs to re-evaluate its 
assumptions regarding any references it has to other UNIX domain sockets, 
which may have "gone away" while the lock was released.  Interestingly, many 
of these races also existed in 4.x and before, but they are more exposed with 
greater kernel parallelism.  I recently closed a spate of them, but it looks 
like a few remain.  In this case, the listen socket has possibly been closed 
(although possibly not) while sonewconn() is called.  It could be a reference 
needs to be added to so2 before dropping the unp lock.  I saw John's 
follow-up, but if ups/he don't have a fixed in a few days once I get back to 
the UK, I can investigate.  Send me a ping next week if I appear to forget 
:-).

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060909132826.K84834>