From owner-freebsd-net@FreeBSD.ORG Mon Feb 2 14:50:05 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9006D1065678 for ; Mon, 2 Feb 2009 14:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 731E08FC14 for ; Mon, 2 Feb 2009 14:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n12Eo50V067188 for ; Mon, 2 Feb 2009 14:50:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n12Eo5RZ067187; Mon, 2 Feb 2009 14:50:05 GMT (envelope-from gnats) Date: Mon, 2 Feb 2009 14:50:05 GMT Message-Id: <200902021450.n12Eo5RZ067187@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Robert Watson Cc: Subject: Re: kern/129719: Panic during shutdown, tcp_ctloutput: inp == NULL X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Robert Watson List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Feb 2009 14:50:05 -0000 The following reply was made to PR kern/129719; it has been noted by GNATS. From: Robert Watson To: Dan Nelson Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org Subject: Re: kern/129719: Panic during shutdown, tcp_ctloutput: inp == NULL Date: Mon, 2 Feb 2009 14:45:43 +0000 (GMT) On Wed, 17 Dec 2008, Dan Nelson wrote: > I've been trying to solve an intermittent connectivity problem where a > server stops seeing incoming packets. It happened today, and when the > system was shutting down, it paniced and rebooted. The gdb stack trace is a > little mangled due to inlined functions, but the trap was in tcp_usrreq.c, > line 1266. Looks like it was trying to reconnect a TCP NFS mount. Hi Dan: Thanks, as always, for your helpful bug report! A NULL pointer dereference here suggests that a second thread has closed the socket while it was in use by the first thread reconnecting it (the thread shown in these traces)--possibly a race condition in the NFS client code, given that the connection wasn't actually connected yet? > 1255 int > 1256 tcp_ctloutput(struct socket *so, struct sockopt *sopt) > 1257 { > 1258 int error, opt, optval; > 1259 struct inpcb *inp; > 1260 struct tcpcb *tp; > 1261 struct tcp_info ti; > 1262 > 1263 error = 0; > 1264 inp = sotoinpcb(so); > 1265 KASSERT(inp != NULL, ("tcp_ctloutput: inp == NULL")); > 1266 * INP_WLOCK(inp); > 1267 if (sopt->sopt_level != IPPROTO_TCP) { > > I don't have INVARIANTS enabled, which would have triggered the KASSERT one > line up. I've got the core dump if more info is needed. The aftermath of panics like these is a bit hard to diagnose, unfortunately, but a few kgdb requests below: > #1 0xc06bd1e6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418 > #2 0xc06bd4e3 in panic (fmt=Variable "fmt" is not available) at ../../../kern/kern_shutdown.c:574 > #3 0xc091cb09 in trap_fatal (frame=0xef7fb8c8, eva=172) at ../../../i386/i386/trap.c:939 > #4 0xc091cd59 in trap_pfault (frame=0xef7fb8c8, usermode=0, eva=172) at ../../../i386/i386/trap.c:852 > #5 0xc091d6eb in trap (frame=0xef7fb8c8) at ../../../i386/i386/trap.c:530 > #6 0xc0904a2b in calltrap () at ../../../i386/i386/exception.s:159 > #7 0xc07f58fd in tcp_ctloutput (so=0xc71a0680, sopt=0xef7fbac8) at atomic.h:149 Could you print *so in this frame? I assume so_pcb is NULL, but if not, *(struct inpcb *)so->so_pcb is also interesting. > #8 0xc071024d in sosetopt (so=0xc71a0680, sopt=0xef7fbac8) at ../../../kern/uipc_socket.c:2339 > #9 0xc083ba5c in nfs_connect (nmp=0xc54e4d20, rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:428 Probably useful to have *nmp here. > #10 0xc083bf9a in nfs_reconnect (rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:542 And probably, on general principle, *rep here. Perhaps the race involves a shutdown-time unmount while NFS is reconnecting a socket in another thread? It would be useful to see the stack trace of whatever thread is performing the shutdown, if you can find it. Try "info threads" and see if that shows up in an obvious manner -- perhaps the shutdown thread is in the VFS tear-down from boot()? Robert N M Watson Computer Laboratory University of Cambridge