From owner-freebsd-net@FreeBSD.ORG Sat Jan 4 21:41:04 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A4E851EE; Sat, 4 Jan 2014 21:41:04 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CF585138B; Sat, 4 Jan 2014 21:41:03 +0000 (UTC) Received: from [192.168.1.103] (p508F1427.dip0.t-ipconnect.de [80.143.20.39]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 0A76C1C0C0692; Sat, 4 Jan 2014 22:41:00 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: Long-haul problems - connections stuck in slow start From: Michael Tuexen In-Reply-To: <52C85537.7080307@wemm.org> Date: Sat, 4 Jan 2014 22:40:59 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <90E0038B-7ED8-49B8-A947-86F8F33438D9@lurchi.franken.de> References: <52C85537.7080307@wemm.org> To: Peter Wemm X-Mailer: Apple Mail (2.1510) Cc: freebsd-net@freebsd.org, Gavin Atkinson , andre@freebsd.org, Peter Wemm X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 21:41:04 -0000 On Jan 4, 2014, at 7:38 PM, Peter Wemm wrote: > We're seeing some unfortunate misbehavior with tcp over an = intercontinental > link. >=20 > eg: fetching a 30GB http file from various package mirrors by a = remote: > us-west(ISC) -> london(BME) > bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 961 kBps = 09h03m^C > us-east(NYI) -> london(BME) > bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 1070 kBps = 08h08m^C > us-west(YSV) -> london(BME) > bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 14 kBps = 590h22m^C >=20 > Spot the one we're concerned about... >=20 > Ping times for the three (in order): > round-trip min/avg/max/std-dev =3D 144.330/144.532/144.797/0.157 ms > round-trip min/avg/max/std-dev =3D 79.650/79.965/80.488/0.287 ms > round-trip min/avg/max/std-dev =3D 148.588/153.292/155.688/2.903 ms >=20 > The problem pair is worth showing some detail on: > 16 bytes from ..:206a::1001:10, icmp_seq=3D4 hlim=3D55 time=3D148.588 = ms > 16 bytes from ..:206a::1001:10, icmp_seq=3D5 hlim=3D55 time=3D155.140 = ms > 16 bytes from ..:206a::1001:10, icmp_seq=3D6 hlim=3D55 time=3D149.443 = ms > 16 bytes from ..:206a::1001:10, icmp_seq=3D7 hlim=3D55 time=3D155.688 = ms > 16 bytes from ..:206a::1001:10, icmp_seq=3D8 hlim=3D55 time=3D148.630 = ms > 16 bytes from ..:206a::1001:10, icmp_seq=3D9 hlim=3D55 time=3D155.486 = ms > It appears that there are two packet paths between the endpoints that = have > either ~148ms or ~155ms. I've done some longer samples and they're = fairly > consistent clusters. >=20 > All four machines talk to each other. >=20 > Here's where it gets interesting. On the sender at us-west(YSV), I = see this: > net.inet.tcp.hostcache.list: > IP address SSTRESH RTT RTTVAR CWND HITS > us-west(ISC) 59521 5ms 1ms 16845 15055031 > eu-west(BME) 7343 150ms 2ms 13501 3433775 > us-east(NYI) 530489 100ms 37ms 16681 43043786 >=20 > The ssthresh is very low for the problematic ysv<->bme pair. >=20 > When I do a tcpdump, I see the sender fire off 7343 bytes of data, = then stop > and wait for acks. It's completely ignoring the receiver's window = state. > It appears stuck in slowstart mode. >=20 > Some other data: > Proto Recv-Q Send-Q Local Address Foreign Address (state) > tcp6 0 1047852 2001:19:2.443 2001:41c8:.24490 ESTABLISHED >=20 > (netstat -x, sorry about the wrap) > Proto Recv-Q Send-Q Local Address Foreign Address = R-MBUF > S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX = S-BMAX > rexmt persist keep 2msl delack rcvtime > tcp6 0 1048152 2001:1900:2254:2.443 2001:41c8:112:83.24490 = 0 > 374 0 373 65688 1049580 1 2048 0 1420800 525504 > 8396640 0.43 0.00 7199.93 0.00 0.00 0.06 >=20 > The "interesting" parts of -x: > rexmt persist keep 2msl delack rcvtime > 0.43 0.00 7199.93 0.00 0.00 0.06 >=20 > -T > Proto Rexmit OOORcv 0-win Local Address Foreign Address > tcp6 54161 0 0 2001:192.443 2001:41:83.24490 > note retransmits(!) >=20 > Some tcpcb fields that caught my eye for the connection: > snd_wnd =3D 1048576, > snd_cwnd =3D 5712, > t_srtt =3D 6391, > t_rttvar =3D 903, > t_rxtshift =3D 0, > t_rttmin =3D 30, > t_rttbest =3D 4903, > t_rttupdated =3D 220095, > max_sndwnd =3D 1048576, > snd_cwnd_prev =3D 4284, > snd_ssthresh_prev =3D 2856, > snd_recover_prev =3D 1397053524, > t_sndzerowin =3D 0, > t_badrxtwin =3D 584273259, > snd_limited =3D 0 '\0', > t_rttlow =3D 150, > I've stored some dumps of the tcpcb at > http://people.freebsd.org/~peter/tcpcb.txt > Note that some in the tcpcb.txt file also have > snd_limited =3D 2 '\002', >=20 > Over the last few days I've tried things like turning off sack, tso, = the > various rfc knobs etc. I believe they're all back to normal now. >=20 > There's small ~15 second tcpdump sample of the sender side and the = receiver > side at: http://people.freebsd.org/~peter/send.cap.gz and > http://people.freebsd.org/~peter/recv.cap.gz > Both ends were ntp synced. The dumps have no sensitive data. >=20 > For amusement, I just tried this, with roughly 1 second in between: > peter@bme:~ % scp pkg-ysv:k.gz /tmp > k.gz 100% 25MB 5.0MB/s 00:05 > peter@bme:~ % scp pkg-ysv:k.gz /tmp > k.gz 0% 960KB 20.3KB/s 41:29 ETA^C >=20 > There was no pre-existing hostcache state between those two endpoints = for > the first run. At the end, this was created in the hostcache: > IP address SSTRESH RTT RTTVAR BANDWIDTH CWND > 213.138.. 5952 165ms 21ms 0 8688 > All connections went slow after that. Note that the ssh test was over = ipv4 > - the rest above is on ipv6. However, we're seeing the same weird = stuff > with http over ipv4 as well between the same two endpoints. >=20 > It was pointed out to me that this has come up before, eg: misc/173859 > I know we've seen this at work as well. >=20 > A few days earlier we were pushing ~45MB/sec (bytes, not bits) between = these > endpoints. Out of the blue it crashed to ~10KB/sec. Why can't it get = out of > slow-start? Is it even stuck in slow-start like I think? Is the = 148-155ms > bimodal rtt the problem? >=20 > Any insight would be greatly appreciated. (please don't drop me from = cc:) Looking at the receiver tracefile shows that there is some message loss. This limits the throughput... Do you also observe a message loss rate = when using ping? Best regards Michael > --=20 > Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; = KI6FJV > UTF-8: for when a ' just won\342\200\231t do. >=20