Date: Mon, 29 Feb 2016 17:40:47 +0900 From: Yongmin Cho <yongmincho82@gmail.com> To: hiren panchasara <hiren@strugglingcoder.info> Cc: transport@freebsd.org Subject: Re: In TCP recovery state, problem of computing the pipe(amount of data in flight). Message-ID: <20160229084045.GA21079@yongmincho-All-Series> In-Reply-To: <20160227031604.GP31665@strugglingcoder.info> References: <20160225112625.GA5680@yongmincho-All-Series> <20160227031604.GP31665@strugglingcoder.info>
next in thread | previous in thread | raw e-mail | index | archive | help
Thanks very much for the quick reply. I'm sorry, I didn't make patch file. First, I wanted to discuss my opinion is right. Let me give you example, please check this. <- ACK (cumack = 1, sack[3-4], sack[6-7], sack[9-10]) * here segment/byte 2, 5, and 8 are missing. <- ACK (cumack = 1, sack[12-13], sack[15-16], sack[18-19]) * here segment/bytes 11, 14 and 17 are also reported missing. <- ACK (cumack = 1, sack[18-19], sack[21-24], sack[27, 28]) * here segment/bytes 20 and 25, 26 are missing. (triggered fast retransmission) <- ACK.....(cumack = 1, sack[21-24], sack[27, 28], sack[32, 33]) <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 44]) <- ACK.....(cumack = 1, sack[34-35], sack[37, 40], sack[42, 45]) <- ACKs.....(many duplication acks, and new sacked blocks) In the fast recovery phase, the pipe is caculated like below, If the net.inet.tcp.rfc6675_pipe is turned on. pipe = snd_max - snd_una + sack_bytes_rexmit(1 MSS size) - sacked_bytes(10 = 34-35, 37-40, 42-45 tcp_sack.c:390) One segment is sended(sack_bytes_rexmit), when triggered fast retransmission. Because the snd_cwnd was set 1 mss size. (tcp_input.c:2609) In the fast recovery phase, The sender can send data, If this condition is right(awnd < tp->snd_ssthresh tcp_input.c:2568). When in the network, It still has many in flight packets, snd_max and snd_una will not be changed, and sack_bytes_rexmit is one MSS size, and sacked_bytes is caculated by last ACK that has three SACK blocks(34-35, 37-40, 42-45). So, sometimes(In my test environment) the awnd(pipe) value can't go down less than the snd_ssthresh, while receiving each ACKs in fast recovery phase. You know, If the awnd value can't go down less than the snd_ssthresh, The sender can't send data that is included snd_holes. So, I think, the sacked_bytes should be caculated by all of sacked blocks that is greater than snd_una, like below. pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(3-4, 6-7, 9-10, 12-13, 15-16, 18-19, 21-24, 27-28, ...). But on current implementation, the sacked_bytes is caculated by three(or four) sacked blocks that is in last ACK, like below. pipe = snd_max - snd_una + sack_bytes_rexmit - sacked_bytes(34-35, 37-40, 42-45 -> sacked blocks of last ACK). My opinion may not be right. Just I want to check implementation of computing pipe. Thank you. On Fri, Feb 26, 2016 at 07:16:04PM -0800, hiren panchasara wrote: > On 02/25/16 at 08:26P, Yongmin Cho wrote: > > Hi, all. > > > > I have a question about net.inet.tcp.rfc6675_pipe in sysctl. > > The bytes in flight was changed to be like below in r290122. > > pipe = snd_max - snd_una - sackhint.sacked_bytes + > > sackhint.sack_bytes_rexmit. > > I think, The implementation of sackhint.sack_bytes_rexmit is right. > > But, I don't think, sackhint.sacked_bytes is right way. > > The sackhint.sacked_bytes is computed by array of sack_blocks in > > tcp_sack_doack function. > > You know, tcp header can have four sacked blocks. > > (If tcp uses timestmap option, tcp header can have three sacked > > blocks.) > > Even if The receiver has sacked blocks greater than three or four, > > The receiver can send ack with three or four last sack blocks. > > So if the receiver has many sacked blocks, the sender only knows three > > sacked_bytes. > > the snd_holes tail queue in struct tcpcb has all of sack holes which > > is greater than snd_una. > > So, i think, sack_bytes_rexmit is correct. > > Because sack_bytes_rexmit is computed by snd_holes tail queue in > > struct tcpcb. > > but sackhint.sacked_bytes is too small. > > Because sackhint.sacked_bytes is just computed by ack with three or > > four last sacked blocks. > > So, the return value of tcp_compute_pipe() function is too big, while > > recovery phase. > > In recovery state, the sender can send data, > > if the return value of tcp_compute_pipe() should be less than > > snd_ssthresh. > > Sometimes it takes a long time to send data, if the sender knows many > > sack holes. > > Furthermore, Sometimes the sender can't send data, Because the return > > value of tcp_compute_pipe() function. > > And retransmission timeout is triggered. > > Your analysis is correct and we did think about this. Please look at > https://reviews.freebsd.org/D3971 's summary section. Main reason for > going with this approach was that it was at least on the conservative > side i.e. would send less data (and not more) and would not bloat the > network. > > BTW, have you run into this problem of this causing slower recovery? > > > > IMO, sackhint.sack_bytes should be computed using snd_holes tail > > queue. > > Because snd_holes has all of sack holes which is greater than snd_una, > > sackhint.sack_bytes can be computed using snd_holes. > > I thought snd_holes also gets populated by the info in SACKs and if for > some reason other end has more than 3 or 4 holes and can't send it, > snd_holes would also have incorrect info. I'd have to look at the code > again to see if its possible to do this more correctly with snd_holes. > Though, I do see the point of this approach would provide better > protection against transient problems where other end cannot send SACK > holes info for a couple times and resumes again. Again, I'd have to go > look at the code closely. > > It'd be even better if you have a patch for this. If not, no worries. > :-) > > Cheers, > Hiren
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160229084045.GA21079>