From owner-freebsd-current@FreeBSD.ORG Mon Sep 25 15:47:05 2006 Return-Path: X-Original-To: current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8581516A416; Mon, 25 Sep 2006 15:47:05 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C01843D7D; Mon, 25 Sep 2006 15:47:00 +0000 (GMT) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.13.6/8.13.6) id k8PFkxMn093354; Mon, 25 Sep 2006 10:46:59 -0500 (CDT) (envelope-from dan) Date: Mon, 25 Sep 2006 10:46:59 -0500 From: Dan Nelson To: current@FreeBSD.org, net@FreeBSD.org, Andre Oppermann , mohans@FreeBSD.org Message-ID: <20060925154659.GE73717@dan.emsphone.com> References: <20060925095745.GA80527@funkthat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060925095745.GA80527@funkthat.com> X-OS: FreeBSD 6.1-STABLE X-message-flag: Outlook Error User-Agent: Mutt/1.5.13 (2006-08-11) Cc: Subject: Re: odd TCP rtt/retransmit timeout issue... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Sep 2006 15:47:05 -0000 In the last episode (Sep 25), John-Mark Gurney said: > I was brining up another interface that I just added to /etc/rc.conf > and ran the command /etc/rc.d/netif start to initalize it... But > then my connection never came back.... I found that the shell was > still active as I could type commands like sleep 5, and another > session's w would see sleep 5 run on the session... even filling up > the send-q w/ 32k of data didn't get the HEAD box to send any data to > the client... > > With the help of silby, I managed to find that the t_rxtcur value in > the tcpcb was getting a very large value. The session that hung had > a retransmit timeout of 19 days... This led us to find that the > TCPT_RANGESET macro was letting very large tvmin values override the > more sane tvmax values due to an extra else. I have added that so we > shouldn't see any more multi day timeouts, but we still apparently > have a problem where the rtt value calculated is wildly incorrect... > > It appears that each connection will get a different "random" rtt > values... From a few connections to my machine: > (kgdb) print ((struct tcpcb *)0xc3a34af8)->t_rxtcur > $3 = 64000 > (kgdb) print ((struct tcpcb *)0xc3a3457c)->t_rxtcur > $6 = 1662654093 > (kgdb) print ((struct tcpcb *)0xc3a343a8)->t_rxtcur > $12 = 1358 > (kgdb) print ((struct tcpcb *)0xc3a9e1d4)->t_rxtcur > $17 = 203 > (kgdb) print ((struct tcpcb *)0xc3a9e000)->t_rxtcur > $19 = 284155863 Do you have net.inet.tcp.inflight.enable=1 ? You might be hitting something related to kern/75122. You'll want to pull the raw gnats repository file to read it; the query-pr.cgi web interface doesn't parse the file right and it loses all the replies. -- Dan Nelson dnelson@allantgroup.com