From owner-freebsd-net@FreeBSD.ORG Tue Jan 7 20:45:10 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 366F6348 for ; Tue, 7 Jan 2014 20:45:10 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 11E921CD4 for ; Tue, 7 Jan 2014 20:45:09 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s07Kj8mn017791 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 7 Jan 2014 12:45:09 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s07Kj8wR017790; Tue, 7 Jan 2014 12:45:08 -0800 (PST) (envelope-from jmg) Date: Tue, 7 Jan 2014 12:45:08 -0800 From: John-Mark Gurney To: Peter Wemm Subject: Re: TCP question: Is this simultaneous close handling broken? Message-ID: <20140107204508.GS99167@funkthat.com> Mail-Followup-To: Peter Wemm , freebsd-net@freebsd.org References: <52CB3AE9.3030107@wemm.org> <52CC5F2E.5030201@wemm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52CC5F2E.5030201@wemm.org> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Tue, 07 Jan 2014 12:45:09 -0800 (PST) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jan 2014 20:45:10 -0000 Peter Wemm wrote this message on Tue, Jan 07, 2014 at 12:10 -0800: > On 1/6/14, 3:23 PM, Peter Wemm wrote: > > We've hit a weird problem at work when dealing with simultaneous closes. > > In this particular case, it's a FreeBSD-7.4 machine talking some random > > Linux host. > > > > There is a client/server protocol in use, and both ends are doing a close > > at the same time. It might be a shutdown, I haven't seen all the code yet. > [..] > > A packet capture, with relative timestamps: > > > > 000050 freebsd.28411 > linux.14001: F 6486:6486(0) ack 232 > > 000031 linux.14001 > freebsd.28411: F 232:232(0) ack 6486 > > 000333 linux.14001 > freebsd.28411: . ack 6487 > > [200ms retransmit timer fires on linux] > > 200490 linux.14001 > freebsd.28411: F 232:232(0) ack 6487 > > 000011 freebsd.28411 > linux.14001: . ack 233 > [..] > > What am I looking at? Who's at fault? It looks like we're failing to > > recognize the ack for our fin. > > It definitely looks like FreeBSD at fault. We've simply not acked their FIN > until they retransmitted it. > > I've looked at the commit logs and I don't see anything obvious that stands > out to me for a fix for this. Most of the changes seem to be lock structure > changes than protocol fixes. I see things like ECN and other protocol > features being added as well. > > Where should I look in the code? I've been looking in tcp_input.c. When we send the FIN, we are in FIN_WAIT_1, and then upon receiving the FIN, we should transition to CLOSING. This happens in tcp_do_segment when we receive a packet w/ the _FIN bit set while in FIN_WAIT_1. The next question is if we are hitting this code (maybe a printf), why isn't the packet being sent out... Only a page or so down from this, you see: /* * Return any desired output. */ if (needoutput || (tp->t_flags & TF_ACKNOW)) (void) tcp_output(tp); And the only what TF_ACKNOW isn't set is if for some reason the TF_NEEDSYN flag is still set (from just above the previous code)... So, maybe a printf on the transition to _CLOSING to make sure it's hit, plus a print of t_flags at the same location to make sure _NEEDSYN isn't set would help us understand what is wrong... If we don't get the printf, then there is other weird stuff going on... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."