Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Aug 2014 16:22:54 +0800
From:      Niu Zhixiong <kaiaixi@gmail.com>
To:        Niu Zhixiong <kaiaixi@gmail.com>, Michael Tuexen <Michael.Tuexen@lurchi.franken.de>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Bill Yuan <bycn82@gmail.com>
Subject:   Re: A problem on TCP in High RTT Environment.
Message-ID:  <CAOENNMDX4KXvQD6sBQM1Sbp13=zkJTy9dwHQ0i1bU_Ae85dvzw@mail.gmail.com>
In-Reply-To: <20140811171517.GW83475@funkthat.com>
References:  <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <CAOENNMCPuiYS7LHwMfOczhZ4yisjGkpOmWzv2pcAoi9Hhzb7dw@mail.gmail.com> <20140810022350.GI83475@funkthat.com> <CAOENNMB3=FZx5kSHVPDPBTtMKbmYJ=c_XNMcuYuoLPe=6U%2Bkxg@mail.gmail.com> <CAOENNMARg36KH1Y%2B0wG8pd7sSf8XKnMf6g790_KiKaj3Mdwyjw@mail.gmail.com> <20140810033212.GL83475@funkthat.com> <CAOENNMA-dwPQr53bM4rzC=1eitoi-JAB4mCGx4zybFwUC=GMNg@mail.gmail.com> <20140810045355.GM83475@funkthat.com> <CAOENNMDcmKSXca0fnuvC82o5Q%2B6mm7TBdDQHXz-ThH1pr2YthA@mail.gmail.com> <CAOENNMBo82MydA9Ewtxj4QijF_XA3j7DqB2%2B10jSp1=GYmSDBw@mail.gmail.com> <20140811171517.GW83475@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I use a switch and capture in the a sender mirror port. and I also noticed
that some acks are before segment. I am not sure how to solve the problem.
But, for my kvm-based virtual machines experimental environment. These are
no such issues. =E2=80=8B
 testtest.tar.gz
<https://docs.google.com/file/d/0By8sTL79ob4tR0hGWHZFZzFWRUk/edit?usp=3Ddri=
ve_web>
=E2=80=8B



Regards,
Niu Zhixiong
=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=
=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D
 kaiaixi@gmail.com


On Tue, Aug 12, 2014 at 1:15 AM, John-Mark Gurney <jmg@funkthat.com> wrote:

> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 20:27 +0800:
> > Hi, I am not sure whether my last email is filtered by mailing list.
> > After disabled tso??? the speed become even poorer???
> > This is the packets captures. Plz see google drive.
> >  tcp_with_tso_off.pcapng.gz
> > <
> https://docs.google.com/file/d/0By8sTL79ob4tYXQ0N0lZN0FUNVE/edit?usp=3Ddr=
ive_web
> >
>
> So, the reason that this is also slow is that it only ever really has one
> segment on the wire at a time...  This is similar to the previous
> packet capture...
>
> Which side was thie captured on? Was this the receiving
> side?  Because it looks like packets are getting merged still...
>
> 22:19:25.628087 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq
> 149171:152067, ack 1, win 32783, options [nop,nop,TS val 61731427 ecr
> 2405797018], length 2896
>
> and as before:
> 22:19:25.634095 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq
> 165099:166547, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr
> 2405797022], length 1448
> 22:19:25.635084 IP 10.0.10.3.9000 > 10.0.10.2.62995: Flags [.], ack
> 167995, win 32745, options [nop,nop,TS val 2405797438 ecr 61731431], leng=
th
> 0
> 22:19:25.635097 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq
> 166547:167995, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr
> 2405797022], length 1448
> 22:19:25.636073 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq
> 167995:170891, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr
> 2405797022], length 2896
> 22:19:25.636266 IP 10.0.10.3.9000 > 10.0.10.2.62995: Flags [.], ack
> 170891, win 32745, options [nop,nop,TS val 2405797439 ecr 61731431], leng=
th
> 0
>
> Though the other thing I noticed is that we appear to be ack'ing before
> the segment was received, which is a bit odd...  And it happens quite
> consistantly...
>
> We really need someone who knows our TCP stack to comment on this...
>
> > On Sun, Aug 10, 2014 at 1:24 PM, Niu Zhixiong <kaiaixi@gmail.com> wrote=
:
> >
> > > Hi???
> > > After disabled tso??? the speed become even poorer???
> > > This is the packets captures. Plz see google drive.
> > > ???
> > >  tcp_with_tso_off.pcapng.gz
> > > <
> https://docs.google.com/file/d/0By8sTL79ob4tYXQ0N0lZN0FUNVE/edit?usp=3Ddr=
ive_web
> >
> > > ???
> > >
> > >
> > > John-Mark Gurney <jmg@funkthat.com
> >???2014???8???10?????????????????????
> > >
> > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 11:48 +0800:
> > >> > I am using Intel I350-T4 NIC. The LRO is closed by default. And by
> the
> > >> way,
> > >> > when I am using KVM-based virtual machine(virtio NIC) do the exact=
ly
> > >> same
> > >> > test. The results are same.
> > >>
> > >> Have you tried disabling tso?  I asked that in an earlier email, but
> > >> never heard from you if that changed anything...
> > >>
> > >> a lot of the trace looks like:
> > >> 19:29:57.223574 IP 10.0.10.2.61010 > 10.0.10.3.9000: .
> > >> 251521:257313(5792) ack 1 win 32783 <nop,nop,timestamp 51563557
> 1047294279>
> > >> 19:29:57.223798 IP 10.0.10.3.9000 > 10.0.10.2.61010: . ack 257313 wi=
n
> > >> 32745 <nop,nop,timestamp 1047294690 51563557>
> > >> 19:29:57.225570 IP 10.0.10.2.61010 > 10.0.10.3.9000: .
> > >> 257313:263105(5792) ack 1 win 32783 <nop,nop,timestamp 51563557
> 1047294279>
> > >>
> > >> Notice how the ack comes back immediately, but for some reason, we
> decide
> > >> to
> > >> wait almost 2ms before sending out the next frame...
> > >>
> > >> For some reason, we just aren't filling our window out...  tcptcace'=
s
> > >> graphs shows the winow at 2MB, but we only ever have 4 segments
> > >> outstanding at once...
> > >>
> > >> > ifconfig igb0
> > >> > igb0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric =
0
> mtu
> > >> 1500
> > >> >
> > >>
> options=3D403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS=
UM,TSO4,TSO6,VLAN_HWTSO>
> > >> >  ether a0:36:9f:38:27:d0
> > >> > inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255
> > >> > inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1
> > >> >  nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> > >> > media: Ethernet autoselect (1000baseT <full-duplex>)
> > >> >  status: active
> > >> >
> > >> > Regards,
> > >> > Niu Zhixiong
> > >> > ?????????????????????????????????????????????
> > >> >  kaiaixi@gmail.com
> > >> >
> > >> >
> > >> > On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney <
> jmg@funkthat.com>
> > >> wrote:
> > >> >
> > >> > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50
> +0800:
> > >> > > > I am sorry that I upload a WRONG SCTP capture. But, the
> throughput
> > >> is
> > >> > > same.
> > >> > > > SCTP is double than TCP, about 18Mbps.
> > >> > > > ???
> > >> > > >  sctp_2.pcapng.gz
> > >> > > > <
> > >> > >
> > >>
> https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=3Ddr=
ive_web
> > >> > > >
> > >> > > > ???
> > >> > >
> > >> > > Ok, the owin graph is very interesting...  We do have a full 2MB
> > >> window
> > >> > > on the receiver side, but for some reason, we only ever have jus=
t
> > >> under
> > >> > > 6k outstanding on the connection...
> > >> > >
> > >> > > So, it looks like we send for a short period of time, and then
> stop
> > >> > > sending...  Do you have LRO enabled?  I think it might be relate=
d
> to:
> > >> > > https://svnweb.freebsd.org/changeset/base/r256920
> > >> > >
> > >> > > As I'm seeing >100ms gaps where the sender doesn't send any data=
,
> and
> > >> > > as soon as more than one ack comes in, the next segment goes
> out...
> > >>  If
> > >> > > we only receive a single ack, then we wait for a timeout before
> > >> sending
> > >> > > the next segment..
> > >> > >
> > >> > > Can you try to disable LRO on the receiving host?
> > >> > >
> > >> > > ifconfig <iface> -lro
> > >> > >
> > >> > > And see if that helps... If it does...  Applying the patch, or
> > >> compiling
> > >> > > a more recent kernel from stable/10 that is after r257367 as tha=
t
> is
> > >> was
> > >> > > the date that the change was merged...
> > >> > >
> > >> > > > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong <
> kaiaixi@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > I am sure that wnd is about 2MB all the time.
> > >> > > > > This is my latest capture, plz see Google Drive.
> > >> > > > > In the latest test, TCP(0s-120s) is about 9Mbps and
> SCTP(0s-120s)
> > >> is
> > >> > > about
> > >> > > > > 18Mbps.
> > >> > > > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet)
> > >> > > > > The SCTP and TCP are tested in same environment.
> > >> > > > >
> > >> > > > > ???
> > >> > > > >  sctp.pcapng.gz
> > >> > > > > <
> > >> > >
> > >>
> https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=3Ddr=
ive_web
> > >> > > >
> > >> > > > > ??????
> > >> > > > >  tcp.pcapng.gz
> > >> > > > > <
> > >> > >
> > >>
> https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=3Ddr=
ive_web
> > >> > > >
> > >> > > > > ???
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Niu Zhixiong
> > >> > > > > ?????????????????????????????????????????????
> > >> > > > >  kaiaixi@gmail.com
> > >> > > > >
> > >> > > > >
> > >> > > > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney <
> > >> jmg@funkthat.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:=
12
> > >> +0800:
> > >> > > > >> > During the TCP4 transmission.
> > >> > > > >> > Proto Recv-Q Send-Q Local Address          Foreign Addres=
s
> > >> > > > >>  (state)
> > >> > > > >> > tcp4       0 2097346 10.0.10.2.13504        10.0.10.3.900=
0
> > >> > > > >> > ESTABLISHED
> > >> > > > >>
> > >> > > > >> Ok, so you are getting a full 2MB in there, and w/ that, yo=
u
> > >> should
> > >> > > > >> easily be saturating your pipe...
> > >> > > > >>
> > >> > > > >> The next thing would be to get a tcpdump, and take a look a=
t
> the
> > >> > > > >> window size.. Wireshark has lots of neat tools to make this
> > >> analysis
> > >> > > > >> easy...  Another tool that is good is tcptrace..  It can
> output a
> > >> > > > >> variety of different graphs that will help you track down,
> and
> > >> see
> > >> > > > >> what part of the system is the problem...
> > >> > > > >>
> > >> > > > >> You probably only need a few tens of seconds of the
> tcpdump...
> > >> > > > >>
> > >> > > > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen <
> > >> > > > >> > Michael.Tuexen@lurchi.franken.de> wrote:
> > >> > > > >> >
> > >> > > > >> > >
> > >> > > > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney <
> jmg@funkthat.com
> > >> >
> > >> > > wrote:
> > >> > > > >> > >
> > >> > > > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 201=
4
> at
> > >> 21:51
> > >> > > > >> +0200:
> > >> > > > >> > > >>
> > >> > > > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney <
> > >> jmg@funkthat.com>
> > >> > > > >> wrote:
> > >> > > > >> > > >>
> > >> > > > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 201=
4
> at
> > >> 20:34
> > >> > > > >> +0800:
> > >> > > > >> > > >>>> Dear all,
> > >> > > > >> > > >>>>
> > >> > > > >> > > >>>> Last month, I send problems related to FTP/TCP in =
a
> > >> high RTT
> > >> > > > >> > > environment.
> > >> > > > >> > > >>>> After that, I setup a simulation
> environment(Dummynet)
> > >> to
> > >> > > test
> > >> > > > >> TCP
> > >> > > > >> > > and SCTP
> > >> > > > >> > > >>>> in high delay environment. After finishing the
> test, I
> > >> can
> > >> > > see
> > >> > > > >> TCP is
> > >> > > > >> > > >>>> always slower than SCTP. But, I think it is not
> > >> possible.
> > >> > > (Plz
> > >> > > > >> see the
> > >> > > > >> > > >>>> figure in the attachment). When the delay is
> 200ms(means
> > >> > > > >> RTT=3D400ms).
> > >> > > > >> > > >>>> Besides, the TCP is extremely slow.
> > >> > > > >> > > >>>>
> > >> > > > >> > > >>>> ALL BW=3D20Mbps, DELAY=3D 0 ~ 200MS, Packet LOSS =
=3D 0 (by
> > >> > > dummynet)
> > >> > > > >> > > >>>>
> > >> > > > >> > > >>>> This is my parameters:
> > >> > > > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEA=
SE
> > >> #0: Thu
> > >> > > Aug
> > >> > > > >>  7
> > >> > > > >> > > >>>> 11:04:15 HKT 2014
> > >> > > > >> > > >>>>
> > >> > > > >> > > >>>> sysctl net.inet.tcp
> > >> > > > >> > > >>>
> > >> > > > >> > > >>> [...]
> > >> > > > >> > > >>>
> > >> > > > >> > > >>>> net.inet.tcp.recvbuf_auto: 0
> > >> > > > >> > > >>>
> > >> > > > >> > > >>> [...]
> > >> > > > >> > > >>>
> > >> > > > >> > > >>>> net.inet.tcp.sendbuf_auto: 0
> > >> > > > >> > > >>>
> > >> > > > >> > > >>> Try enabling this...  This should allow the buffer =
to
> > >> grow
> > >> > > large
> > >> > > > >> enough
> > >> > > > >> > > >>> to deal w/ the higher latency...
> > >> > > > >> > > >>>
> > >> > > > >> > > >>> Also, make sure your program isn't setting the recv
> > >> buffer
> > >> > > size
> > >> > > > >> as that
> > >> > > > >> > > >>> will disable the auto growing...
> > >> > > > >> > > >> I think the program sets the buffer to 2MB, which it
> also
> > >> does
> > >> > > for
> > >> > > > >> SCTP.
> > >> > > > >> > > >> So having both statically at the same size makes sen=
se
> > >> for the
> > >> > > > >> > > comparison.
> > >> > > > >> > > >> I remember that there was a bug in the combination o=
f
> LRO
> > >> and
> > >> > > > >> delayed
> > >> > > > >> > > ACK,
> > >> > > > >> > > >> which was fixed, but I don't remember it was fixed
> before
> > >> > > 10.0...
> > >> > > > >> > > >
> > >> > > > >> > > > Sounds like disabling LRO and TSO would be a useful
> test
> > >> to see
> > >> > > if
> > >> > > > >> that
> > >> > > > >> > > > improves things...  But hiren said that the fix made
> it,
> > >> so...
> > >> > > > >> > > >
> > >> > > > >> > > >>> If you use netstat -a, you should be able to see th=
e
> > >> send-q
> > >> > > on the
> > >> > > > >> > > >>> sender grow as necessary...
> > >> > > > >> > > >
> > >> > > > >> > > > Also, getting the send-q output while it's running
> would
> > >> let us
> > >> > > know
> > >> > > > >> > > > if the buffer is getting to 2MB or not...
> > >> > > > >> > > That is correct. Niu: Can you provide this?
>
> --
>   John-Mark Gurney                              Voice: +1 415 225 5579
>
>      "All that I will do, has been done, All that I have, has not."
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOENNMDX4KXvQD6sBQM1Sbp13=zkJTy9dwHQ0i1bU_Ae85dvzw>