From owner-freebsd-net@FreeBSD.ORG Sun Aug 10 03:56:08 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 812C9BE7 for ; Sun, 10 Aug 2014 03:56:08 +0000 (UTC) Received: from mail-qc0-x235.google.com (mail-qc0-x235.google.com [IPv6:2607:f8b0:400d:c01::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 383582FE2 for ; Sun, 10 Aug 2014 03:56:08 +0000 (UTC) Received: by mail-qc0-f181.google.com with SMTP id x13so597644qcv.26 for ; Sat, 09 Aug 2014 20:56:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=oyo/oJu7Tz9rdnyarvTcaDGUkowG0LTSoTAsT8v5rWA=; b=wXPXHnQ6t6QB89F9d4z+JV/Y+AJjNZ0PMFmlj7c304G9ZQNKdl5OWMZlbm9by4kuuY Gb1LKT+X7/h/IvsWNAIHBkDxow0LcvYyACPHNARlGNb4SzzHTZYC0RcQ1sCsgaXNB0L/ UQaHVkRxhEfgpbZj0rwuMJ/AN3f2kUz4VNv7sJEI/2rXW3+o0l/piDyn5xUnQi4U3OA4 pO2DMN4dSVtS9IU40nHw5VTyl5Aam9SM1/u9gqG+VF8IkWNKqy/SQF5CkNJb+Xn46zwi IVqHDSTlcVkQKGGbPn5e4m1b7bBjiZpO3jMxO5lKi46pzyNl9P+rwrmpWHT4VT8zSQ4e +T5Q== MIME-Version: 1.0 X-Received: by 10.140.41.38 with SMTP id y35mr35650669qgy.69.1407642967256; Sat, 09 Aug 2014 20:56:07 -0700 (PDT) Received: by 10.224.137.71 with HTTP; Sat, 9 Aug 2014 20:56:07 -0700 (PDT) In-Reply-To: References: <20140809184232.GF83475@funkthat.com> <8AE1AC56-D52F-4F13-AAA3-BB96042B37DD@lurchi.franken.de> <20140809204500.GG83475@funkthat.com> <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <20140810022350.GI83475@funkthat.com> <20140810033212.GL83475@funkthat.com> Date: Sun, 10 Aug 2014 11:56:07 +0800 Message-ID: Subject: Re: A problem on TCP in High RTT Environment. From: Niu Zhixiong To: Niu Zhixiong , Michael Tuexen , freebsd-net@freebsd.org, Bill Yuan , John-Mark Gurney Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Aug 2014 03:56:08 -0000 Actually. In the http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-kern= el-limits.html 12.11.2.2. TCP Bandwidth Delay Product I saw an option called net.inet.tcp.inflight.enable net.inet.tcp.inflight.debug net.inet.tcp.inflight.min But, in FreeBSD 9.3R and 10R. I cannot find anything related to inflight in sysctl net.inet.tcp. Regards, Niu Zhixiong =EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF= =BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D kaiaixi@gmail.com On Sun, Aug 10, 2014 at 11:48 AM, Niu Zhixiong wrote: > I am using Intel I350-T4 NIC. The LRO is closed by default. And by the > way, when I am using KVM-based virtual machine(virtio NIC) do the exactly > same test. The results are same. > > ifconfig igb0 > igb0: flags=3D8843 metric 0 mtu 1= 500 > > options=3D403bb > ether a0:36:9f:38:27:d0 > inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255 > inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1 > nd6 options=3D29 > media: Ethernet autoselect (1000baseT ) > status: active > > Regards, > Niu Zhixiong > =EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D= =EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D > kaiaixi@gmail.com > > > On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney > wrote: > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50 +0800: >> > I am sorry that I upload a WRONG SCTP capture. But, the throughput is >> same. >> > SCTP is double than TCP, about 18Mbps. >> > ??? >> > sctp_2.pcapng.gz >> > < >> https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=3Dd= rive_web >> > >> > ??? >> >> Ok, the owin graph is very interesting... We do have a full 2MB window >> on the receiver side, but for some reason, we only ever have just under >> 6k outstanding on the connection... >> >> So, it looks like we send for a short period of time, and then stop >> sending... Do you have LRO enabled? I think it might be related to: >> https://svnweb.freebsd.org/changeset/base/r256920 >> >> As I'm seeing >100ms gaps where the sender doesn't send any data, and >> as soon as more than one ack comes in, the next segment goes out... If >> we only receive a single ack, then we wait for a timeout before sending >> the next segment.. >> >> Can you try to disable LRO on the receiving host? >> >> ifconfig -lro >> >> And see if that helps... If it does... Applying the patch, or compiling >> a more recent kernel from stable/10 that is after r257367 as that is was >> the date that the change was merged... >> >> > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong >> wrote: >> > >> > > I am sure that wnd is about 2MB all the time. >> > > This is my latest capture, plz see Google Drive. >> > > In the latest test, TCP(0s-120s) is about 9Mbps and SCTP(0s-120s) is >> about >> > > 18Mbps. >> > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet) >> > > The SCTP and TCP are tested in same environment. >> > > >> > > ??? >> > > sctp.pcapng.gz >> > > < >> https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=3Dd= rive_web >> > >> > > ?????? >> > > tcp.pcapng.gz >> > > < >> https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=3Dd= rive_web >> > >> > > ??? >> > > >> > > >> > > >> > > Regards, >> > > Niu Zhixiong >> > > ????????????????????????????????????????????? >> > > kaiaixi@gmail.com >> > > >> > > >> > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney >> > > wrote: >> > > >> > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:12 +0800= : >> > >> > During the TCP4 transmission. >> > >> > Proto Recv-Q Send-Q Local Address Foreign Address >> > >> (state) >> > >> > tcp4 0 2097346 10.0.10.2.13504 10.0.10.3.9000 >> > >> > ESTABLISHED >> > >> >> > >> Ok, so you are getting a full 2MB in there, and w/ that, you should >> > >> easily be saturating your pipe... >> > >> >> > >> The next thing would be to get a tcpdump, and take a look at the >> > >> window size.. Wireshark has lots of neat tools to make this analysi= s >> > >> easy... Another tool that is good is tcptrace.. It can output a >> > >> variety of different graphs that will help you track down, and see >> > >> what part of the system is the problem... >> > >> >> > >> You probably only need a few tens of seconds of the tcpdump... >> > >> >> > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen < >> > >> > Michael.Tuexen@lurchi.franken.de> wrote: >> > >> > >> > >> > > >> > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney >> wrote: >> > >> > > >> > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 2014 at 21:= 51 >> > >> +0200: >> > >> > > >> >> > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney >> > >> wrote: >> > >> > > >> >> > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 2014 at 20:= 34 >> > >> +0800: >> > >> > > >>>> Dear all, >> > >> > > >>>> >> > >> > > >>>> Last month, I send problems related to FTP/TCP in a high R= TT >> > >> > > environment. >> > >> > > >>>> After that, I setup a simulation environment(Dummynet) to >> test >> > >> TCP >> > >> > > and SCTP >> > >> > > >>>> in high delay environment. After finishing the test, I can >> see >> > >> TCP is >> > >> > > >>>> always slower than SCTP. But, I think it is not possible. >> (Plz >> > >> see the >> > >> > > >>>> figure in the attachment). When the delay is 200ms(means >> > >> RTT=3D400ms). >> > >> > > >>>> Besides, the TCP is extremely slow. >> > >> > > >>>> >> > >> > > >>>> ALL BW=3D20Mbps, DELAY=3D 0 ~ 200MS, Packet LOSS =3D 0 (by >> dummynet) >> > >> > > >>>> >> > >> > > >>>> This is my parameters: >> > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEASE #0: >> Thu Aug >> > >> 7 >> > >> > > >>>> 11:04:15 HKT 2014 >> > >> > > >>>> >> > >> > > >>>> sysctl net.inet.tcp >> > >> > > >>> >> > >> > > >>> [...] >> > >> > > >>> >> > >> > > >>>> net.inet.tcp.recvbuf_auto: 0 >> > >> > > >>> >> > >> > > >>> [...] >> > >> > > >>> >> > >> > > >>>> net.inet.tcp.sendbuf_auto: 0 >> > >> > > >>> >> > >> > > >>> Try enabling this... This should allow the buffer to grow >> large >> > >> enough >> > >> > > >>> to deal w/ the higher latency... >> > >> > > >>> >> > >> > > >>> Also, make sure your program isn't setting the recv buffer >> size >> > >> as that >> > >> > > >>> will disable the auto growing... >> > >> > > >> I think the program sets the buffer to 2MB, which it also >> does for >> > >> SCTP. >> > >> > > >> So having both statically at the same size makes sense for t= he >> > >> > > comparison. >> > >> > > >> I remember that there was a bug in the combination of LRO an= d >> > >> delayed >> > >> > > ACK, >> > >> > > >> which was fixed, but I don't remember it was fixed before >> 10.0... >> > >> > > > >> > >> > > > Sounds like disabling LRO and TSO would be a useful test to >> see if >> > >> that >> > >> > > > improves things... But hiren said that the fix made it, so..= . >> > >> > > > >> > >> > > >>> If you use netstat -a, you should be able to see the send-q >> on the >> > >> > > >>> sender grow as necessary... >> > >> > > > >> > >> > > > Also, getting the send-q output while it's running would let >> us know >> > >> > > > if the buffer is getting to 2MB or not... >> > >> > > That is correct. Niu: Can you provide this? >> >> -- >> John-Mark Gurney Voice: +1 415 225 5579 >> >> "All that I will do, has been done, All that I have, has not." >> > >