From owner-freebsd-transport@freebsd.org Tue Mar 8 01:29:24 2016 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADC72AC7E4D for ; Tue, 8 Mar 2016 01:29:24 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 8B8BE614 for ; Tue, 8 Mar 2016 01:29:24 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 8A57AAC7E4C; Tue, 8 Mar 2016 01:29:24 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71B49AC7E4B for ; Tue, 8 Mar 2016 01:29:24 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: from mail-pf0-x233.google.com (mail-pf0-x233.google.com [IPv6:2607:f8b0:400e:c00::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B692613 for ; Tue, 8 Mar 2016 01:29:24 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: by mail-pf0-x233.google.com with SMTP id x188so771396pfb.2 for ; Mon, 07 Mar 2016 17:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=qdAOxx6HwVesR4mdhSb+AoW/7WKQQY80LwIjLmC0GtY=; b=o5wvKHpVY+zF31ccMSno5h3cyVydgAiWqmu0UI+gH0OwqapaDT9XZNagvoB2BxFc17 pPWFtJe9IayyfIJclYjbCfgf4K6YHBB0HqmtvBM94DnbEuOjvmD93TQ2dNt9PPFkQT4Y 04ygYwDXzoFJStpg/6WW4ie/O1B8iHpZ4/y9lN/q5MeMsyMmmgAMdK6YXdpIEgOzldKN uUtLquCaRCJ80RJzs2pc9u+ThVEZYeZsveFhEU2qz7lS7Da3VF/82F9Ok5e63XI8VOHW VZfcnS3bPTzLK2QrFq3s0JcSzd/ACP4G9cQrp8/JtFhqBwkO9htsQesP3fglPVGWzTGy Ob3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=qdAOxx6HwVesR4mdhSb+AoW/7WKQQY80LwIjLmC0GtY=; b=O1p34AwFs1pHV6KOFi20U5wtaVWKF2I7YMy8anZIXAz0ThgRuOGSjhvz7jCjZbL1bV qyQDo05dv+cPYsfyziGdMM5yS1gyRKb0tF2HO/4NLtZqhrKaf3kfY8JmfHz1YXJx2cWa 76bBEd5IOVEa95K7v10LRakQsWi/2ehv1wxs4QKi1fB8qzJBc0nzqW6jtZlxN0l6EazW TnwyGO6/4U0ifHlKW/sUIzmtQK+KOgjl0jkKXCJ7hpw7vqk2B9NnYL2qRvEP5eZW6hEc sdtQsK1MbfxaVl7UydOAjIF4/2RuJCodePuxQFg4Mvtpu68Fa+Bnq9BKGkNKc/ivFQpE vLjA== X-Gm-Message-State: AD7BkJKv5dJQY/wQi0f0+2Rwyfiqd3Oco6LJEWH7lO4e5UjmRZIH+Xw56NXKipnMMg0DRw== X-Received: by 10.98.72.193 with SMTP id q62mr38178701pfi.117.1457400563734; Mon, 07 Mar 2016 17:29:23 -0800 (PST) Received: from yongmincho-All-Series ([106.247.248.2]) by smtp.gmail.com with ESMTPSA id wq3sm315439pac.44.2016.03.07.17.29.21 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 07 Mar 2016 17:29:22 -0800 (PST) Date: Tue, 8 Mar 2016 10:29:32 +0900 From: Yongmin Cho To: hiren panchasara Cc: transport@freebsd.org Subject: Re: In TCP recovery state, problem of computing the pipe(amount of data in flight). Message-ID: <20160308012930.GA24199@yongmincho-All-Series> References: <20160225112625.GA5680@yongmincho-All-Series> <20160227031604.GP31665@strugglingcoder.info> <20160229084045.GA21079@yongmincho-All-Series> <20160229175453.GA82027@strugglingcoder.info> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="oyUTqETQ0mS9luUI" Content-Disposition: inline In-Reply-To: <20160229175453.GA82027@strugglingcoder.info> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 01:29:24 -0000 --oyUTqETQ0mS9luUI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Feb 29, 2016 at 09:54:53AM -0800, hiren panchasara wrote: > On 02/29/16 at 05:40P, Yongmin Cho wrote: > > Thanks very much for the quick reply. > > > > I'm sorry, I didn't make patch file. > > First, I wanted to discuss my opinion is right. > > > > Let me give you example, please check this. > > > > <- ACK (cumack = 1, sack[3-4], sack[6-7], sack[9-10]) > > * here segment/byte 2, 5, and 8 are missing. > > <- ACK (cumack = 1, sack[12-13], sack[15-16], sack[18-19]) > > * here segment/bytes 11, 14 and 17 are also reported missing. > > <- ACK (cumack = 1, sack[18-19], sack[21-24], sack[27, 28]) > > * here segment/bytes 20 and 25, 26 are missing. (triggered fast > > retransmission) > > <- ACK.....(cumack = 1, sack[21-24], sack[27, 28], sack[32, 33]) > > <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 44]) > > <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 45]) > > <- ACKs.....(many duplication acks, and new sacked blocks) > > > > In the fast recovery phase, the pipe is caculated like below, If the > > net.inet.tcp.rfc6675_pipe is turned on. > > pipe = snd_max - snd_una + sack_bytes_rexmit(1 MSS size) - > > sacked_bytes(10 = 34-35, 37-40, 42-45 tcp_sack.c:390) > > One segment is sended(sack_bytes_rexmit), when triggered fast > > retransmission. > > Because the snd_cwnd was set 1 mss size. (tcp_input.c:2609) > > In the fast recovery phase, The sender can send data, > > If this condition is right(awnd < tp->snd_ssthresh tcp_input.c:2568). > > When in the network, It still has many in flight packets, > > snd_max and snd_una will not be changed, and sack_bytes_rexmit is one MSS > > size, and sacked_bytes is caculated by last ACK that has three SACK > > blocks(34-35, 37-40, 42-45). > > So, sometimes(In my test environment) the awnd(pipe) value can't go > > down less than the snd_ssthresh, while receiving each ACKs > > in fast recovery phase. > > You know, If the awnd value can't go down less than the snd_ssthresh, > > The sender can't send data that is included snd_holes. > > > > So, I think, the sacked_bytes should be caculated by all of sacked > > blocks that is greater than snd_una, like below. > > pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(3-4, 6-7, > > 9-10, 12-13, 15-16, 18-19, 21-24, 27-28, ...). > > > > But on current implementation, the sacked_bytes is caculated by three(or four) > > sacked blocks that is in last ACK, like below. > > pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(34-35, > > 37-40, 42-45 -> sacked blocks of last ACK). > > > > My opinion may not be right. Just I want to check implementation of > > computing pipe. > > Your opinion seems correct to me. If you can create a patch based on > this, that'd be great. As I cannot spend time on this until next week. > > Cheers, > Hiren I've created a patch based on this issue. You know, The sack_blocks array in tcp_sack_doack function(tcp_sack.c) have only 3 or 4 sacked blocks. But We need to know all the data that has been sacked. The snd_holes queue(in tcp_cb structure) is updated every time we get an ACK with sack blocks. So, the snd_holes queue know all of hole information. And, We can get the sacked_bytes from the snd_holes. So, I calculated the sacked_bytes every time when updating the snd_holes. I've tested this patch in my test environment, And I think, this implementation is working well. Please check this patch. any feedback will be welcome. Thanks. --oyUTqETQ0mS9luUI Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="tcp_sack.diff" Index: sys/netinet/tcp_sack.c =================================================================== --- sys/netinet/tcp_sack.c (revision 296480) +++ sys/netinet/tcp_sack.c (working copy) @@ -355,12 +355,13 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to { struct sackhole *cur, *temp; struct sackblk sack, sack_blocks[TCP_MAX_SACK + 1], *sblkp; - int i, j, num_sack_blks, sack_changed; + int i, j, num_sack_blks, sack_changed, sacked_bytes; INP_WLOCK_ASSERT(tp->t_inpcb); num_sack_blks = 0; sack_changed = 0; + sacked_bytes = 0; /* * If SND.UNA will be advanced by SEG.ACK, and if SACK holes exist, * treat [SND.UNA, SEG.ACK) as if it is a SACK block. @@ -374,7 +375,6 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to * received new blocks from the other side. */ if (to->to_flags & TOF_SACK) { - tp->sackhint.sacked_bytes = 0; /* reset */ for (i = 0; i < to->to_nsacks; i++) { bcopy((to->to_sacks + i * TCPOLEN_SACK), &sack, sizeof(sack)); @@ -387,8 +387,6 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to SEQ_GT(sack.end, tp->snd_una) && SEQ_LEQ(sack.end, tp->snd_max)) { sack_blocks[num_sack_blks++] = sack; - tp->sackhint.sacked_bytes += - (sack.end-sack.start); } } } @@ -446,6 +444,7 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to temp = tcp_sackhole_insert(tp, tp->snd_fack,sblkp->start,NULL); if (temp != NULL) { tp->snd_fack = sblkp->end; + sacked_bytes += (sblkp->end - sblkp->start); /* Go to the previous sack block. */ sblkp--; sack_changed = 1; @@ -462,11 +461,14 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to SEQ_LT(tp->snd_fack, sblkp->start)) sblkp--; if (sblkp >= sack_blocks && - SEQ_LT(tp->snd_fack, sblkp->end)) + SEQ_LT(tp->snd_fack, sblkp->end)) { + sacked_bytes += (sblkp->end - tp->snd_fack); tp->snd_fack = sblkp->end; + } } } else if (SEQ_LT(tp->snd_fack, sblkp->end)) { /* fack is advanced. */ + sacked_bytes += (sblkp->end - tp->snd_fack); tp->snd_fack = sblkp->end; sack_changed = 1; } @@ -503,6 +505,10 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to /* Data acks at least the beginning of hole. */ if (SEQ_GEQ(sblkp->end, cur->end)) { /* Acks entire hole, so delete hole. */ + if (SEQ_GT(cur->start, th_ack)) + sacked_bytes += (cur->end - cur->start); + else + sacked_bytes -= (sblkp->end - cur->end); temp = cur; cur = TAILQ_PREV(cur, sackhole_head, scblink); tcp_sackhole_remove(tp, temp); @@ -514,6 +520,8 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to continue; } else { /* Move start of hole forward. */ + if (SEQ_GT(sblkp->start, th_ack)) + sacked_bytes += (sblkp->end - cur->start); cur->start = sblkp->end; cur->rxmit = SEQ_MAX(cur->rxmit, cur->start); } @@ -521,6 +529,7 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to /* Data acks at least the end of hole. */ if (SEQ_GEQ(sblkp->end, cur->end)) { /* Move end of hole backward. */ + sacked_bytes += (cur->end - sblkp->start); cur->end = sblkp->start; cur->rxmit = SEQ_MIN(cur->rxmit, cur->end); } else { @@ -528,6 +537,7 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to * ACKs some data in middle of a hole; need * to split current hole */ + sacked_bytes += (sblkp->end - sblkp->start); temp = tcp_sackhole_insert(tp, sblkp->end, cur->end, cur); if (temp != NULL) { @@ -554,6 +564,7 @@ tcp_sack_doack(struct tcpcb *tp, struct tcpopt *to else sblkp--; } + tp->sackhint.sacked_bytes += sacked_bytes; return (sack_changed); } --oyUTqETQ0mS9luUI--