From owner-freebsd-bugs@FreeBSD.ORG Mon Feb 2 14:45:44 2009 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25B3D1065673; Mon, 2 Feb 2009 14:45:44 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id F3BF48FC17; Mon, 2 Feb 2009 14:45:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 7ADD546B03; Mon, 2 Feb 2009 09:45:43 -0500 (EST) Date: Mon, 2 Feb 2009 14:45:43 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Dan Nelson In-Reply-To: <200812171829.mBHITWuA073418@dan.emsphone.com> Message-ID: References: <200812171829.mBHITWuA073418@dan.emsphone.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, freebsd-bugs@FreeBSD.org, FreeBSD-gnats-submit@FreeBSD.org Subject: Re: kern/129719: Panic during shutdown, tcp_ctloutput: inp == NULL X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Feb 2009 14:45:44 -0000 On Wed, 17 Dec 2008, Dan Nelson wrote: > I've been trying to solve an intermittent connectivity problem where a > server stops seeing incoming packets. It happened today, and when the > system was shutting down, it paniced and rebooted. The gdb stack trace is a > little mangled due to inlined functions, but the trap was in tcp_usrreq.c, > line 1266. Looks like it was trying to reconnect a TCP NFS mount. Hi Dan: Thanks, as always, for your helpful bug report! A NULL pointer dereference here suggests that a second thread has closed the socket while it was in use by the first thread reconnecting it (the thread shown in these traces)--possibly a race condition in the NFS client code, given that the connection wasn't actually connected yet? > 1255 int > 1256 tcp_ctloutput(struct socket *so, struct sockopt *sopt) > 1257 { > 1258 int error, opt, optval; > 1259 struct inpcb *inp; > 1260 struct tcpcb *tp; > 1261 struct tcp_info ti; > 1262 > 1263 error = 0; > 1264 inp = sotoinpcb(so); > 1265 KASSERT(inp != NULL, ("tcp_ctloutput: inp == NULL")); > 1266 * INP_WLOCK(inp); > 1267 if (sopt->sopt_level != IPPROTO_TCP) { > > I don't have INVARIANTS enabled, which would have triggered the KASSERT one > line up. I've got the core dump if more info is needed. The aftermath of panics like these is a bit hard to diagnose, unfortunately, but a few kgdb requests below: > #1 0xc06bd1e6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418 > #2 0xc06bd4e3 in panic (fmt=Variable "fmt" is not available) at ../../../kern/kern_shutdown.c:574 > #3 0xc091cb09 in trap_fatal (frame=0xef7fb8c8, eva=172) at ../../../i386/i386/trap.c:939 > #4 0xc091cd59 in trap_pfault (frame=0xef7fb8c8, usermode=0, eva=172) at ../../../i386/i386/trap.c:852 > #5 0xc091d6eb in trap (frame=0xef7fb8c8) at ../../../i386/i386/trap.c:530 > #6 0xc0904a2b in calltrap () at ../../../i386/i386/exception.s:159 > #7 0xc07f58fd in tcp_ctloutput (so=0xc71a0680, sopt=0xef7fbac8) at atomic.h:149 Could you print *so in this frame? I assume so_pcb is NULL, but if not, *(struct inpcb *)so->so_pcb is also interesting. > #8 0xc071024d in sosetopt (so=0xc71a0680, sopt=0xef7fbac8) at ../../../kern/uipc_socket.c:2339 > #9 0xc083ba5c in nfs_connect (nmp=0xc54e4d20, rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:428 Probably useful to have *nmp here. > #10 0xc083bf9a in nfs_reconnect (rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:542 And probably, on general principle, *rep here. Perhaps the race involves a shutdown-time unmount while NFS is reconnecting a socket in another thread? It would be useful to see the stack trace of whatever thread is performing the shutdown, if you can find it. Try "info threads" and see if that shows up in an obvious manner -- perhaps the shutdown thread is in the VFS tear-down from boot()? Robert N M Watson Computer Laboratory University of Cambridge