Date: Sat, 9 Aug 2014 22:06:46 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Niu Zhixiong <kaiaixi@gmail.com> Cc: Michael Tuexen <Michael.Tuexen@lurchi.franken.de>, Bill Yuan <bycn82@gmail.com>, freebsd-net@freebsd.org Subject: Re: A problem on TCP in High RTT Environment. Message-ID: <20140810050646.GN83475@funkthat.com> In-Reply-To: <CAOENNMDgcQoT3T4ayu3fK7YJs178ACH_2y5b4SvPhkwR3o_4Hw@mail.gmail.com> References: <8AE1AC56-D52F-4F13-AAA3-BB96042B37DD@lurchi.franken.de> <20140809204500.GG83475@funkthat.com> <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <CAOENNMCPuiYS7LHwMfOczhZ4yisjGkpOmWzv2pcAoi9Hhzb7dw@mail.gmail.com> <20140810022350.GI83475@funkthat.com> <CAOENNMB3=FZx5kSHVPDPBTtMKbmYJ=c_XNMcuYuoLPe=6U%2Bkxg@mail.gmail.com> <CAOENNMARg36KH1Y%2B0wG8pd7sSf8XKnMf6g790_KiKaj3Mdwyjw@mail.gmail.com> <20140810033212.GL83475@funkthat.com> <CAOENNMA-dwPQr53bM4rzC=1eitoi-JAB4mCGx4zybFwUC=GMNg@mail.gmail.com> <CAOENNMDgcQoT3T4ayu3fK7YJs178ACH_2y5b4SvPhkwR3o_4Hw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 11:56 +0800: > Actually. In the > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-kernel-limits.html > 12.11.2.2. TCP Bandwidth Delay Product > I saw an option called > net.inet.tcp.inflight.enable > net.inet.tcp.inflight.debug > net.inet.tcp.inflight.min > > But, in FreeBSD 9.3R and 10R. I cannot find anything related to inflight in > sysctl net.inet.tcp. Looks like it was removed for the pluggable congestion control: https://svnweb.freebsd.org/changeset/base/r211315 man mod_cc for more info... > On Sun, Aug 10, 2014 at 11:48 AM, Niu Zhixiong <kaiaixi@gmail.com> wrote: > > > I am using Intel I350-T4 NIC. The LRO is closed by default. And by the > > way, when I am using KVM-based virtual machine(virtio NIC) do the exactly > > same test. The results are same. > > > > ifconfig igb0 > > igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > > > > options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO> > > ether a0:36:9f:38:27:d0 > > inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255 > > inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1 > > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > > media: Ethernet autoselect (1000baseT <full-duplex>) > > status: active > > > > Regards, > > Niu Zhixiong > > ????????????????????????????????????????????? > > kaiaixi@gmail.com > > > > > > On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney <jmg@funkthat.com> > > wrote: > > > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50 +0800: > >> > I am sorry that I upload a WRONG SCTP capture. But, the throughput is > >> same. > >> > SCTP is double than TCP, about 18Mbps. > >> > ??? > >> > sctp_2.pcapng.gz > >> > < > >> https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=drive_web > >> > > >> > ??? > >> > >> Ok, the owin graph is very interesting... We do have a full 2MB window > >> on the receiver side, but for some reason, we only ever have just under > >> 6k outstanding on the connection... > >> > >> So, it looks like we send for a short period of time, and then stop > >> sending... Do you have LRO enabled? I think it might be related to: > >> https://svnweb.freebsd.org/changeset/base/r256920 > >> > >> As I'm seeing >100ms gaps where the sender doesn't send any data, and > >> as soon as more than one ack comes in, the next segment goes out... If > >> we only receive a single ack, then we wait for a timeout before sending > >> the next segment.. > >> > >> Can you try to disable LRO on the receiving host? > >> > >> ifconfig <iface> -lro > >> > >> And see if that helps... If it does... Applying the patch, or compiling > >> a more recent kernel from stable/10 that is after r257367 as that is was > >> the date that the change was merged... > >> > >> > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong <kaiaixi@gmail.com> > >> wrote: > >> > > >> > > I am sure that wnd is about 2MB all the time. > >> > > This is my latest capture, plz see Google Drive. > >> > > In the latest test, TCP(0s-120s) is about 9Mbps and SCTP(0s-120s) is > >> about > >> > > 18Mbps. > >> > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet) > >> > > The SCTP and TCP are tested in same environment. > >> > > > >> > > ??? > >> > > sctp.pcapng.gz > >> > > < > >> https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=drive_web > >> > > >> > > ?????? > >> > > tcp.pcapng.gz > >> > > < > >> https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=drive_web > >> > > >> > > ??? > >> > > > >> > > > >> > > > >> > > Regards, > >> > > Niu Zhixiong > >> > > ????????????????????????????????????????????? > >> > > kaiaixi@gmail.com > >> > > > >> > > > >> > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney <jmg@funkthat.com> > >> > > wrote: > >> > > > >> > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:12 +0800: > >> > >> > During the TCP4 transmission. > >> > >> > Proto Recv-Q Send-Q Local Address Foreign Address > >> > >> (state) > >> > >> > tcp4 0 2097346 10.0.10.2.13504 10.0.10.3.9000 > >> > >> > ESTABLISHED > >> > >> > >> > >> Ok, so you are getting a full 2MB in there, and w/ that, you should > >> > >> easily be saturating your pipe... > >> > >> > >> > >> The next thing would be to get a tcpdump, and take a look at the > >> > >> window size.. Wireshark has lots of neat tools to make this analysis > >> > >> easy... Another tool that is good is tcptrace.. It can output a > >> > >> variety of different graphs that will help you track down, and see > >> > >> what part of the system is the problem... > >> > >> > >> > >> You probably only need a few tens of seconds of the tcpdump... > >> > >> > >> > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen < > >> > >> > Michael.Tuexen@lurchi.franken.de> wrote: > >> > >> > > >> > >> > > > >> > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney <jmg@funkthat.com> > >> wrote: > >> > >> > > > >> > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 2014 at 21:51 > >> > >> +0200: > >> > >> > > >> > >> > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney <jmg@funkthat.com> > >> > >> wrote: > >> > >> > > >> > >> > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 2014 at 20:34 > >> > >> +0800: > >> > >> > > >>>> Dear all, > >> > >> > > >>>> > >> > >> > > >>>> Last month, I send problems related to FTP/TCP in a high RTT > >> > >> > > environment. > >> > >> > > >>>> After that, I setup a simulation environment(Dummynet) to > >> test > >> > >> TCP > >> > >> > > and SCTP > >> > >> > > >>>> in high delay environment. After finishing the test, I can > >> see > >> > >> TCP is > >> > >> > > >>>> always slower than SCTP. But, I think it is not possible. > >> (Plz > >> > >> see the > >> > >> > > >>>> figure in the attachment). When the delay is 200ms(means > >> > >> RTT=400ms). > >> > >> > > >>>> Besides, the TCP is extremely slow. > >> > >> > > >>>> > >> > >> > > >>>> ALL BW=20Mbps, DELAY= 0 ~ 200MS, Packet LOSS = 0 (by > >> dummynet) > >> > >> > > >>>> > >> > >> > > >>>> This is my parameters: > >> > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEASE #0: > >> Thu Aug > >> > >> 7 > >> > >> > > >>>> 11:04:15 HKT 2014 > >> > >> > > >>>> > >> > >> > > >>>> sysctl net.inet.tcp > >> > >> > > >>> > >> > >> > > >>> [...] > >> > >> > > >>> > >> > >> > > >>>> net.inet.tcp.recvbuf_auto: 0 > >> > >> > > >>> > >> > >> > > >>> [...] > >> > >> > > >>> > >> > >> > > >>>> net.inet.tcp.sendbuf_auto: 0 > >> > >> > > >>> > >> > >> > > >>> Try enabling this... This should allow the buffer to grow > >> large > >> > >> enough > >> > >> > > >>> to deal w/ the higher latency... > >> > >> > > >>> > >> > >> > > >>> Also, make sure your program isn't setting the recv buffer > >> size > >> > >> as that > >> > >> > > >>> will disable the auto growing... > >> > >> > > >> I think the program sets the buffer to 2MB, which it also > >> does for > >> > >> SCTP. > >> > >> > > >> So having both statically at the same size makes sense for the > >> > >> > > comparison. > >> > >> > > >> I remember that there was a bug in the combination of LRO and > >> > >> delayed > >> > >> > > ACK, > >> > >> > > >> which was fixed, but I don't remember it was fixed before > >> 10.0... > >> > >> > > > > >> > >> > > > Sounds like disabling LRO and TSO would be a useful test to > >> see if > >> > >> that > >> > >> > > > improves things... But hiren said that the fix made it, so... > >> > >> > > > > >> > >> > > >>> If you use netstat -a, you should be able to see the send-q > >> on the > >> > >> > > >>> sender grow as necessary... > >> > >> > > > > >> > >> > > > Also, getting the send-q output while it's running would let > >> us know > >> > >> > > > if the buffer is getting to 2MB or not... > >> > >> > > That is correct. Niu: Can you provide this? > >> > >> -- > >> John-Mark Gurney Voice: +1 415 225 5579 > >> > >> "All that I will do, has been done, All that I have, has not." > >> > > > > -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140810050646.GN83475>