From owner-freebsd-net@FreeBSD.ORG  Wed Feb 16 12:15:19 2005
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 575FA16A4CE
	for <net@FreeBSD.org>; Wed, 16 Feb 2005 12:15:19 +0000 (GMT)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1442F43D4C
	for <net@FreeBSD.org>; Wed, 16 Feb 2005 12:15:19 +0000 (GMT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with SMTP id 8A97746B8D
	for <net@FreeBSD.org>; Wed, 16 Feb 2005 07:15:18 -0500 (EST)
Date: Wed, 16 Feb 2005 12:13:56 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: net@FreeBSD.org
Message-ID: <Pine.NEB.3.96L.1050216115925.4054C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: solisten() question: why do we check for completed connections?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Feb 2005 12:15:19 -0000


uipc_syscalls.c:solisten() is responsible for transitioning a socket from
a non-listening state to a listening state.  It does this at two levels: 
directly at the socket level, and at the protocol level by calling into
the protocol using pru_listen().  I'm currently working on fixing a race
between the two layers, but ran into the following question: a code
fragment exists in solisten() that checks whether any completed
connections are present when the protocol returns to solisten(): if no
completed connections are present, it flags the socket as SO_ACCEPTCONN. 
This fragment has existed in some form or another, as data structures
changed, since revision 1.1 when the BSD code was imported into our
current CVS repository.  Stevens volII also makes fleeting reference to
this logic.  However, the implied semantics don't appear to be documented
in the listen(2) man page.  Does anyone have any information on why it is
that we conditionally set SO_ACCEPTCONN base on the completed connection
queue being empty? 

The race I'd like to fix is that it's possible for a TCP SYN to come in
during the state transition to a listening socket, which causes the TCP
code to panic as it doesn't expect a SYN packet to match a TCPS_LISTEN
tcpcb if the socket isn't SO_ACCEPTCONN.  This was presumably introduced
as part of the SMPng work, where preemption and pallelism are now "more
possible".  The easiest fix here would be to push the socket state
transition down a layer into the protocol code, such that the socket
locking and tests are performed while holding the TCP state locks, causing
the multi-layer test-and-set to become atomic (although presumably using a
helper function in the socket library functions that support most
protocols).  This would also close other potential races between multiple
consumers of the socket in multiple threads.  However, it would be quite
simplifying to drop the logic regarding SO_ACCEPTCONN if it's not actually
necessary.

Anyone know anything about this?

Robert N M Watson