From owner-freebsd-net  Tue Jul  4  6: 7:51 2000
Delivered-To: freebsd-net@freebsd.org
Received: from mail.pace.co.uk (mh.pace.co.uk [136.170.50.8])
	by hub.freebsd.org (Postfix) with ESMTP id 719E437B869
	for <freebsd-net@freebsd.org>; Tue,  4 Jul 2000 06:07:38 -0700 (PDT)
	(envelope-from kbracey@pace.co.uk)
Received: from admin-1.pace.co.uk (admin-1.cam.pace.co.uk [136.170.131.64])
	by mail.pace.co.uk (8.9.1b+Sun/8.9.1) with ESMTP id OAA02937
	for <freebsd-net@freebsd.org>; Tue, 4 Jul 2000 14:07:33 +0100 (BST)
Received: from art-work.cam.pace.co.uk (art-work.cam.pace.co.uk [136.170.131.5])
	by admin-1.pace.co.uk (8.9.1b+Sun/8.9.1) with ESMTP id OAA13694
	for <freebsd-net@freebsd.org>; Tue, 4 Jul 2000 14:07:33 +0100 (BST)
Received: from kbracey.cam.pace.co.uk (kbracey.cam.pace.co.uk [136.170.129.213])
	by art-work.cam.pace.co.uk (8.9.3+Sun/8.9.1) with SMTP id OAA27810
	for <freebsd-net@freebsd.org>; Tue, 4 Jul 2000 14:07:32 +0100 (BST)
Date: Tue, 04 Jul 2000 13:52:47 +0100
From: Kevin Bracey <kevin.bracey@pace.co.uk>
To: freebsd-net@freebsd.org
Subject: Race condition in TCP connection drops?
Message-ID: <282ed4d849%kbracey@kbracey.cam.pace.co.uk>
X-Organization: Pace Micro Technology plc, Cambridge, United Kingdom
X-Mailer: Messenger v1.40f for RISC OS
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Posting-Agent: RISC OS Newsbase 0.61b
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

I've just come across a nasty glitch in our FreeBSD derived IP stack, and I'm
curious to know whether the problem is inherent in the BSD network code, or
is due to our implementation of its environment.

I'm describing this from a version of the source from about 2 years ago, so
some of the functions (eg xxx_usrreq) referred to will have changed, but as
far as I can tell the recent changes haven't affected this particular
problem.

The problem occurs when a connection is dropped - tcp_drop() calls
tcp_close(), which then does:

	free(tp, M_PCB);
	inp->inp_ppcb = 0;
	soisdisconnected(so);
	in_pcbdetach(inp);
	tcpstat.tcps_closed++;
	return ((struct tcpcb *)0);

soisdisconnected() calls sowakeup(), which, because SS_ASYNC is set,
calls psignal().

Now, on our system, psignal() sends round an immediate message, on receipt
of which an application detects the failure and calls close() on the socket.

Then, soclose calls tcp_usrreq(PRU_DETACH), which aborts because the inp_ppcb
pointer is 0.

This is totally reliable on our system, because the psignal mechanism is
synchronous. Are there interlocks to prevent this happening on FreeBSD, or is
it a race condition? I'm not as familiar as I perhaps should be with the
Unix kernel environment.

Is there a reason for soisdisconnected() to be called before in_pcbdetach()?

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc                     Tel: +44 (0) 1223 518566
645 Newmarket Road                            Fax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United Kingdom            WWW: http://www.acorn.co.uk/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message