From owner-freebsd-bugs@FreeBSD.ORG  Mon Feb  2 14:45:44 2009
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 25B3D1065673;
	Mon,  2 Feb 2009 14:45:44 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id F3BF48FC17;
	Mon,  2 Feb 2009 14:45:43 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 7ADD546B03;
	Mon,  2 Feb 2009 09:45:43 -0500 (EST)
Date: Mon, 2 Feb 2009 14:45:43 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Dan Nelson <dnelson@allantgroup.com>
In-Reply-To: <200812171829.mBHITWuA073418@dan.emsphone.com>
Message-ID: <alpine.BSF.2.00.0902021438450.77103@fledge.watson.org>
References: <200812171829.mBHITWuA073418@dan.emsphone.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@FreeBSD.org, freebsd-bugs@FreeBSD.org,
	FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/129719: Panic during shutdown, tcp_ctloutput: inp == NULL
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Feb 2009 14:45:44 -0000

On Wed, 17 Dec 2008, Dan Nelson wrote:

> I've been trying to solve an intermittent connectivity problem where a 
> server stops seeing incoming packets.  It happened today, and when the 
> system was shutting down, it paniced and rebooted.  The gdb stack trace is a 
> little mangled due to inlined functions, but the trap was in tcp_usrreq.c, 
> line 1266.  Looks like it was trying to reconnect a TCP NFS mount.

Hi Dan:

Thanks, as always, for your helpful bug report!

A NULL pointer dereference here suggests that a second thread has closed the 
socket while it was in use by the first thread reconnecting it (the thread 
shown in these traces)--possibly a race condition in the NFS client code, 
given that the connection wasn't actually connected yet?

> 1255    int
> 1256    tcp_ctloutput(struct socket *so, struct sockopt *sopt)
> 1257    {
> 1258            int     error, opt, optval;
> 1259            struct  inpcb *inp;
> 1260            struct  tcpcb *tp;
> 1261            struct  tcp_info ti;
> 1262
> 1263            error = 0;
> 1264            inp = sotoinpcb(so);
> 1265            KASSERT(inp != NULL, ("tcp_ctloutput: inp == NULL"));
> 1266 *          INP_WLOCK(inp);
> 1267            if (sopt->sopt_level != IPPROTO_TCP) {
>
> I don't have INVARIANTS enabled, which would have triggered the KASSERT one 
> line up.  I've got the core dump if more info is needed.

The aftermath of panics like these is a bit hard to diagnose, unfortunately, 
but a few kgdb requests below:

> #1  0xc06bd1e6 in boot (howto=260) at ../../../kern/kern_shutdown.c:418
> #2  0xc06bd4e3 in panic (fmt=Variable "fmt" is not available) at ../../../kern/kern_shutdown.c:574
> #3  0xc091cb09 in trap_fatal (frame=0xef7fb8c8, eva=172) at ../../../i386/i386/trap.c:939
> #4  0xc091cd59 in trap_pfault (frame=0xef7fb8c8, usermode=0, eva=172) at ../../../i386/i386/trap.c:852
> #5  0xc091d6eb in trap (frame=0xef7fb8c8) at ../../../i386/i386/trap.c:530
> #6  0xc0904a2b in calltrap () at ../../../i386/i386/exception.s:159
> #7  0xc07f58fd in tcp_ctloutput (so=0xc71a0680, sopt=0xef7fbac8) at atomic.h:149

Could you print *so in this frame?  I assume so_pcb is NULL, but if not, 
*(struct inpcb *)so->so_pcb is also interesting.

> #8  0xc071024d in sosetopt (so=0xc71a0680, sopt=0xef7fbac8) at ../../../kern/uipc_socket.c:2339
> #9  0xc083ba5c in nfs_connect (nmp=0xc54e4d20, rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:428

Probably useful to have *nmp here.

> #10 0xc083bf9a in nfs_reconnect (rep=0xc6208000) at ../../../nfsclient/nfs_socket.c:542

And probably, on general principle, *rep here.

Perhaps the race involves a shutdown-time unmount while NFS is reconnecting a 
socket in another thread?

It would be useful to see the stack trace of whatever thread is performing the 
shutdown, if you can find it.  Try "info threads" and see if that shows up in 
an obvious manner -- perhaps the shutdown thread is in the VFS tear-down from 
boot()?

Robert N M Watson
Computer Laboratory
University of Cambridge