From owner-freebsd-current@FreeBSD.ORG  Tue Apr  4 13:27:31 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A955A16A41F
	for <freebsd-current@freebsd.org>; Tue,  4 Apr 2006 13:27:31 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8C75143D49
	for <freebsd-current@freebsd.org>; Tue,  4 Apr 2006 13:27:30 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 8D61E46BE4;
	Tue,  4 Apr 2006 09:27:29 -0400 (EDT)
Date: Tue, 4 Apr 2006 14:27:29 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Kazuaki Oda <kaakun@highway.ne.jp>
In-Reply-To: <44311AB5.2010407@highway.ne.jp>
Message-ID: <20060404141813.H22854@fledge.watson.org>
References: <4430FAAF.2040809@highway.ne.jp>
	<20060403133210.U36756@fledge.watson.org>
	<44311AB5.2010407@highway.ne.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-current@freebsd.org
Subject: Re: kernel panic: page fault
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Apr 2006 13:27:31 -0000


On Mon, 3 Apr 2006, Kazuaki Oda wrote:

>> Also, are you running with INVARIANTS and/or WITNESS?
>
> Sorry, I compiled kernel without INVARIANTS and WITNESS.

No problem at all -- the debugging information you have sent me is enough to 
track down the source of the problem.  It looks like we have an inconsistency 
in how we handle (especially in my new world order) the recycling of timewait 
state for an inpcb that is still present.  I've committed a work-around which 
should prevent the panic you're seeing, but I need to investigate a bit more 
before I can commit a full solution.

For those interested, the problem is how to handle sockets with attached 
inpcbs that represent closed or time wait TCP connections.  This can happen if 
shutdown() is called on a socket, kicking the TCP state engine into a close 
cycle, rather than a reset.  In the current world order, the following sets of 
socket, pcb, and ppcb protocol state can occur:

fd -> socket <-> inpcb <-> tcpcb	Normal TCP socket in various states.
fd -> socket <-> inpcb <-> twtcp	Unclosed TCP socket in time wait.
fd -> socket <-> inpcb <-> NULL		Unclosed TCP socket after tw recycle.
       socket <-> inpcb <-> tcpcb        Socket closed, buffer still needed.
                  inpcb <-> twtcp        Socket closed, time wait.

The problem was that the middle case exists, but was not accounted for. 
There's another problem that is still present in the new socket/pcb model, in 
which the inpcb of an open socket with a closed TCP connection continues to 
reserve the address/port combination.  This is related to the inpcb without 
twtcp case, where we recycle the twtcp, but can't recylce the inpcb 
immediately because there's still an fd reference to the socket, and hence a 
socket reference to the inpcb.

My current leaning is to do the following:

- Since we now keep inpcb's around for the lifetime of the socket, either
   teach the inpcb lookup routines to ignore INP_DROPPED, or to move them to
   another inpcb list for open but dropped inpcb's.  This will avoid the
   reservation hanging around.

- Either eliminate the inp_ppcb pointer NULL case by prohibiting recycling of
   the twtcp state of a socket that is still open, or more formally supporting
   it through the previous change.  The trick is to prevent the twtcp/inpcb
   pairs from turning up and being used during input and allocation collision
   processing.

The summary is that we're not quite there yet in terms of how it all should be 
working, but that we should avoid the panic for now due to the workarounds I 
committed (which basically are changes to handle the inp_ppcb pointer being 
NULL for INP_TIMEWAIT sockets).

Thanks!

Robert N M Watson