From owner-freebsd-transport@freebsd.org Mon Feb 29 08:40:43 2016 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07FBFAB1259 for ; Mon, 29 Feb 2016 08:40:43 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id DC3531964 for ; Mon, 29 Feb 2016 08:40:42 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id D90FCAB1258; Mon, 29 Feb 2016 08:40:42 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D899DAB1256 for ; Mon, 29 Feb 2016 08:40:42 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com [IPv6:2607:f8b0:400e:c03::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AB1AF1963 for ; Mon, 29 Feb 2016 08:40:42 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: by mail-pa0-x234.google.com with SMTP id bj10so19108885pad.2 for ; Mon, 29 Feb 2016 00:40:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=I2npGEq02uYzSN6YsRnXfVaAF0iJObgGqai9iCotYDw=; b=F6+N1NzSmuXqF99GAl5razIXlB/+7GCvW74baIQAeXEgeQTidUcBhylvxbpKt5jrnR ElCu7beHtdWjEQmxHCsvUkJKnybC3ddnkph9Tli8hbOL32lWfzj9mgplAe8pisVjyrjd OEx0RDPAmKMFrtxED2N8PQElCn7a5ND7TbVC5XvV685O84naPqbV5RY846ZXajtlJOGi iIknuQo9GxlTGNxlZ3CUgOn95s0m//+gTa1rjez/0LiplFBB767KI8NC/Nzu/bh62wl9 KZg5w2gI+fIUprCL6Zo2VN++K42Zrwa36jYh2Ip3wH8+2gH+KPqkTIK7FfDDevw5m/JY T4pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=I2npGEq02uYzSN6YsRnXfVaAF0iJObgGqai9iCotYDw=; b=dRS+GdbaQp+ICDP9yin9nWpouoSpIIze/kuiHNUu25p7mBzHn1Khz7ObwdHgknhWb5 L/vdUcs6L3OZXUrrBSz+SlH/pZxf6X3g7H4qPD8QQC1dN4llnRPLZVA0Sy/AcmDpvbN1 jkJI7lYG+WhHW5LHGg5tezeTN56JwCQfcdbgYBaamnk27FKM4pGVO38DBA6vToP/fAML 4XLo0CbOurc3ZdkjshGwEee0FiELTCh+fM0Lm/UstJKVaz8+YosH+kmV316xDFVja23m G8Y/8gR+7h+BcqJab9BTa0PfjjqQTHjORwUMTQbIOimxcnvjGsDUXWWXbEyIzjJe2EeX C2yg== X-Gm-Message-State: AD7BkJItsdBUzp73/Oqqtr1eso8Wa19O7TG0+V5wcXHUjorYDetKC894JQ5awik5yCY0tg== X-Received: by 10.66.190.168 with SMTP id gr8mr20767833pac.23.1456735241945; Mon, 29 Feb 2016 00:40:41 -0800 (PST) Received: from yongmincho-All-Series ([106.247.248.2]) by smtp.gmail.com with ESMTPSA id q16sm36027843pfi.80.2016.02.29.00.40.40 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 29 Feb 2016 00:40:41 -0800 (PST) Date: Mon, 29 Feb 2016 17:40:47 +0900 From: Yongmin Cho To: hiren panchasara Cc: transport@freebsd.org Subject: Re: In TCP recovery state, problem of computing the pipe(amount of data in flight). Message-ID: <20160229084045.GA21079@yongmincho-All-Series> References: <20160225112625.GA5680@yongmincho-All-Series> <20160227031604.GP31665@strugglingcoder.info> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160227031604.GP31665@strugglingcoder.info> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Feb 2016 08:40:43 -0000 Thanks very much for the quick reply. I'm sorry, I didn't make patch file. First, I wanted to discuss my opinion is right. Let me give you example, please check this. <- ACK (cumack = 1, sack[3-4], sack[6-7], sack[9-10]) * here segment/byte 2, 5, and 8 are missing. <- ACK (cumack = 1, sack[12-13], sack[15-16], sack[18-19]) * here segment/bytes 11, 14 and 17 are also reported missing. <- ACK (cumack = 1, sack[18-19], sack[21-24], sack[27, 28]) * here segment/bytes 20 and 25, 26 are missing. (triggered fast retransmission) <- ACK.....(cumack = 1, sack[21-24], sack[27, 28], sack[32, 33]) <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 44]) <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 45]) <- ACKs.....(many duplication acks, and new sacked blocks) In the fast recovery phase, the pipe is caculated like below, If the net.inet.tcp.rfc6675_pipe is turned on. pipe = snd_max - snd_una + sack_bytes_rexmit(1 MSS size) - sacked_bytes(10 = 34-35, 37-40, 42-45 tcp_sack.c:390) One segment is sended(sack_bytes_rexmit), when triggered fast retransmission. Because the snd_cwnd was set 1 mss size. (tcp_input.c:2609) In the fast recovery phase, The sender can send data, If this condition is right(awnd < tp->snd_ssthresh tcp_input.c:2568). When in the network, It still has many in flight packets, snd_max and snd_una will not be changed, and sack_bytes_rexmit is one MSS size, and sacked_bytes is caculated by last ACK that has three SACK blocks(34-35, 37-40, 42-45). So, sometimes(In my test environment) the awnd(pipe) value can't go down less than the snd_ssthresh, while receiving each ACKs in fast recovery phase. You know, If the awnd value can't go down less than the snd_ssthresh, The sender can't send data that is included snd_holes. So, I think, the sacked_bytes should be caculated by all of sacked blocks that is greater than snd_una, like below. pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(3-4, 6-7, 9-10, 12-13, 15-16, 18-19, 21-24, 27-28, ...). But on current implementation, the sacked_bytes is caculated by three(or four) sacked blocks that is in last ACK, like below. pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(34-35, 37-40, 42-45 -> sacked blocks of last ACK). My opinion may not be right. Just I want to check implementation of computing pipe. Thank you. On Fri, Feb 26, 2016 at 07:16:04PM -0800, hiren panchasara wrote: > On 02/25/16 at 08:26P, Yongmin Cho wrote: > > Hi, all. > > > > I have a question about net.inet.tcp.rfc6675_pipe in sysctl. > > The bytes in flight was changed to be like below in r290122. > > pipe = snd_max - snd_una - sackhint.sacked_bytes + > > sackhint.sack_bytes_rexmit. > > I think, The implementation of sackhint.sack_bytes_rexmit is right. > > But, I don't think, sackhint.sacked_bytes is right way. > > The sackhint.sacked_bytes is computed by array of sack_blocks in > > tcp_sack_doack function. > > You know, tcp header can have four sacked blocks. > > (If tcp uses timestmap option, tcp header can have three sacked > > blocks.) > > Even if The receiver has sacked blocks greater than three or four, > > The receiver can send ack with three or four last sack blocks. > > So if the receiver has many sacked blocks, the sender only knows three > > sacked_bytes. > > the snd_holes tail queue in struct tcpcb has all of sack holes which > > is greater than snd_una. > > So, i think, sack_bytes_rexmit is correct. > > Because sack_bytes_rexmit is computed by snd_holes tail queue in > > struct tcpcb. > > but sackhint.sacked_bytes is too small. > > Because sackhint.sacked_bytes is just computed by ack with three or > > four last sacked blocks. > > So, the return value of tcp_compute_pipe() function is too big, while > > recovery phase. > > In recovery state, the sender can send data, > > if the return value of tcp_compute_pipe() should be less than > > snd_ssthresh. > > Sometimes it takes a long time to send data, if the sender knows many > > sack holes. > > Furthermore, Sometimes the sender can't send data, Because the return > > value of tcp_compute_pipe() function. > > And retransmission timeout is triggered. > > Your analysis is correct and we did think about this. Please look at > https://reviews.freebsd.org/D3971 's summary section. Main reason for > going with this approach was that it was at least on the conservative > side i.e. would send less data (and not more) and would not bloat the > network. > > BTW, have you run into this problem of this causing slower recovery? > > > > IMO, sackhint.sack_bytes should be computed using snd_holes tail > > queue. > > Because snd_holes has all of sack holes which is greater than snd_una, > > sackhint.sack_bytes can be computed using snd_holes. > > I thought snd_holes also gets populated by the info in SACKs and if for > some reason other end has more than 3 or 4 holes and can't send it, > snd_holes would also have incorrect info. I'd have to look at the code > again to see if its possible to do this more correctly with snd_holes. > Though, I do see the point of this approach would provide better > protection against transient problems where other end cannot send SACK > holes info for a couple times and resumes again. Again, I'd have to go > look at the code closely. > > It'd be even better if you have a patch for this. If not, no worries. > :-) > > Cheers, > Hiren