From owner-freebsd-net@FreeBSD.ORG  Thu Oct  9 13:00:57 2003
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2269A16A4B3
	for <freebsd-net@freebsd.org>; Thu,  9 Oct 2003 13:00:57 -0700 (PDT)
Received: from isilon.com (isilon.com [65.101.129.58])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2B5F743FCB
	for <freebsd-net@freebsd.org>; Thu,  9 Oct 2003 13:00:55 -0700 (PDT)
	(envelope-from skye@isilon.com)
Received: from skye by skye-2.isilon.com with local (Exim 4.14)
	id 1A7gqd-00069c-Mj
	for freebsd-net@freebsd.org; Thu, 09 Oct 2003 12:53:19 -0700
Date: Thu, 9 Oct 2003 12:53:19 -0700
From: Skye Poier <skye@isilon.com>
To: freebsd-net@freebsd.org
Message-ID: <20031009195319.GE929@isilon.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
Subject: Panic in NFS (tcp_output) on -current
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Oct 2003 20:00:57 -0000

Hi BSDers,

I'm running with an older version of -current and saw a panic from the
NFS server socket upcall which I'll describe in detail.  I had a close
look at the latest sources and there doesn't appear to be any changes
that would have prevented this panic.  Only ever seen this once in about
a year, so its a pretty rare but fatal case.  Not sure if it applies to
the 4.x branch but anyway...

Here's the stack dump:

	panic
	tcp_output
	tcp_usr_rcvd
	soreceive
	nfsrv_rcv
	sowakeup
	soisdisconnected
	tcp_close
	tcp_drop
	tcp_timer_keep
	softclock

Here's what's happening:  at the end of tcp_close (tcp_discardcb in
-current) the code does this:

	inp->inp_ppcb = NULL;
	...
	soisdisconnected(so);

soisdisconnected does a sorwakeup which calls the nfsrv_rcv upcall which
calls soreceive.  At the end of soreceive we have this:

        if ((flags & MSG_PEEK) == 0) {
		...
                if (pr->pr_flags & PR_WANTRCVD && so->so_pcb)
                        (*pr->pr_usrreqs->pru_rcvd)(so, flags);

Which calls tcp_usr_rcvd.  When it gets the TCP protocol control block
via intotcpcb(sotoinpcb(so)) it ends up with a NULL pointer because of
the assignment to inp_ppcb before calling soisdisconnected above.
Voila, panic in tcp_output on NULL deref.  This would also have happened
if any TCP socket upcall had tried to soreceive MSG_OOB under these
conditions.

My question is this, why is the TCP pcb disconnected from the inpcb
before calling soisdisconnected???  I don't see any benefit to doing
this half-way teardown BEFORE calling soisdisconnected - the only
possible uses (that I can see) of that variable would result in a panic
in every case.  And right after the soisdisconnected, the pcb is
destroyed.  The only thing that is ever checked is so->so_pcb which is
still valid.

Thoughts??  Seems like either the inp->inp_ppcb = NULL (and t_inpcb =
NULL in -current) should be moved AFTER the soisdisconnected, or the
socket should be torn down further (invalidate so->so_pcb?) before
calling soisdisconnected

Thanks!
Skye