From owner-freebsd-current@FreeBSD.ORG  Tue Aug 30 03:05:57 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 96A2116A41F;
	Tue, 30 Aug 2005 03:05:57 +0000 (GMT)
	(envelope-from demizu@dd.iij4u.or.jp)
Received: from r-dd.iij4u.or.jp (r-dd.iij4u.or.jp [210.130.0.70])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F3B9643D46;
	Tue, 30 Aug 2005 03:05:56 +0000 (GMT)
	(envelope-from demizu@dd.iij4u.or.jp)
Received: from localhost (h124.p049.iij4u.or.jp [210.130.49.124])
	by r-dd.iij4u.or.jp (4U-MR/r-dd) id j7U35oTn020072;
	Tue, 30 Aug 2005 12:05:51 +0900 (JST)
Date: Tue, 30 Aug 2005 12:06:04 +0900 (JST)
Message-Id: <20050830.120604.97293728.Noritoshi@Demizu.ORG>
From: Noritoshi Demizu <demizu@dd.iij4u.or.jp>
To: Paul Saab <ps@freebsd.org>
In-Reply-To: <431389C1.4080805@freebsd.org>
References: <20050829221700.GC1118@unixpages.org>
	<431389C1.4080805@freebsd.org>
X-Mailer: Mew version 4.1 on Emacs 21 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: current@freebsd.org
Subject: Re: panic: sackhint rexmit == 0
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2005 03:05:57 -0000

> > #23 0xc063ae06 in tcp_input (m=0xc1e56400, off0=20)
> >     at /usr/home/build/src/sys/netinet/tcp_input.c:1915
> > 1915						KASSERT(tp->sackhint.

> Should already be fixed in -current.

The change of tcp_input.c rev 1.283 does not *fix* the problem.
It is just a workaround of the problem.

I believe the real problem is that tp->snd_cwnd, tp->snd_ssthresh,
tp->snd_recover, and tp->snd_nxt are falsely recovered by an algorithm
using tp->t_badrxtwin in the following scenario.

  1. An Retransmission Timeout occurs.  Now, t_rxtshift is 1.
     (snd_cwnd, snd_ssthresh, snd_recover are saved in tcp_timer_rexmt().)

  2. Before t_rxtshift is reset to zero, Fast Retransmit is triggered
     and TCP enters Fast Recovery.  (Note: t_rxtshift is reset to zero
     by tcp_xmit_timer() when a new RTT measurement is taken.  If my
     memory serves correctly, this behavior is specified in RFC2988).

  3. A partial ACK or a full ACK is received before "ticks" reaches at
     tp->t_badrxtwin.

     In this case, lines 2047-2056 in tcp_input.c 1.283 recovers
     snd_cwnd, snd_ssthresh, snd_recover and snd_nxt.
     (Note: t_rxtshift has not been reset to zero.  It will be reset
     at line 2074 or 2076 of tcp_input.c 1.283.)

     I believe those variables must not be recovered in this case.
     Since snd_recover is recovered, snd_recover becomes smaller than
     the actual value.  Hence, the condition at line 2134 of tcp_input.c
     1.283 falsely becomes true, and TCP falsely exits Fast Recovery.
     It breaks internal states of TCP SACK.

In tcp_input.c 1.282 or before, the possible corruption was detected
by the lines removed in the change of tcp_input.c 1.283.  Since the
check was not so significant, the check was removed as a workaround.

I believe the following change fixes the problem by avoiding step 3
above.  Introducing a new flag indicating whether t_badrxtwin is valid
would be a better solution because a TCP connection may live longer
than 2^32 ticks of time.

  <<Quoted from tcp_input.c rev 1.283>>
  1915:					ENTER_FASTRECOVERY(tp);
  1916:					tp->snd_recover = tp->snd_max;
  1917:					callout_stop(tp->tt_rexmt);
  1918:					tp->t_rtttime = 0;
+					tp->t_badrxtwin = ticks;
  1919:					if (tp->sack_enable) {

Regards,
Noritoshi Demizu