From owner-freebsd-net@FreeBSD.ORG Thu Dec 5 13:39:05 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 956EED4A for ; Thu, 5 Dec 2013 13:39:05 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A74FD1D47 for ; Thu, 5 Dec 2013 13:39:04 +0000 (UTC) Received: from [10.225.9.5] (unknown [194.95.73.101]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 7B7691C0C0693; Thu, 5 Dec 2013 14:39:01 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: SCTP huge connect delays (at amd64) and yet another question From: Michael Tuexen In-Reply-To: <20131205123005.GE71737@netch.kiev.ua> Date: Thu, 5 Dec 2013 14:39:01 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <1564E942-DC9E-4142-89F3-B82EEF1A103C@lurchi.franken.de> References: <20131205084142.GA31113@netch.kiev.ua> <11932BA9-A734-4D4F-BCBB-6A0D926A22A9@lurchi.franken.de> <20131205123005.GE71737@netch.kiev.ua> To: Valentin Nechayev X-Mailer: Apple Mail (2.1510) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Dec 2013 13:39:05 -0000 On Dec 5, 2013, at 1:30 PM, Valentin Nechayev = wrote: > Hi, >=20 > Thu, Dec 05, 2013 at 11:32:03, Michael.Tuexen wrote about "Re: SCTP = huge connect delays (at amd64) and yet another question":=20 >=20 >>> The first discrepancy found is specific for FreeBSD on amd64 and not >>> for i386 version; it's that connection setup lasts 2-4 seconds (!!) >>> Tcpdump shows indication that could be parsed as message miss: >> Hi Valentin, >>=20 >> could you send me the .pcap file instead of the tcpdump output. >> I would like to see the addresses listed in the INIT and INIT-ACK. >=20 > I've sent them, thanks. I answered... >=20 >>> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture = size 65535 byt >>> es >>> 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags = [none], proto SCT >>> P (132), length 188, bad cksum 0 (->f274)!) >>> 10.0.0.2.50025 > 127.0.0.1.2500: sctp >> I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1 >=20 > I've showed the code, it doesn't make any explicit binding or address > suggestion. For this host (9.1/i386), 10.0.0.2 resides on xl0. There > is no routing specifics which forces it to select 10.0.0.2: >=20 > $ route -n get 127.0.0.1 > route to: 127.0.0.1 > destination: 127.0.0.1 > interface: lo0 > flags: > recvpipe sendpipe ssthresh rtt,msec mtu weight expire > 0 0 0 0 16384 1 0 > $ telnet 127.0.0.1 25 > Trying 127.0.0.1... > Connected to localhost. > Escape character is '^]'. > 220 iv.local ESMTP Sendmail 8.14.5/8.14.5; Thu, 5 Dec 2013 13:48:31 = +0200 (EET) > ehlo zzz > 250-iv.local Hello netch@localhost [127.0.0.1], pleased to meet you > [...] >=20 > At least for TCP and UDP, it's quite straightforward. There might be an issue in the SCTP stack. It does handle addresses = differently than UDP. However, I wasn't able to reproduce your problem. I need to = test a setup similar to your, which I haven't done yet. >=20 >>> At 08:18:34.639467, cookie echo was sent but likely ignored. One >>> second later it was resent. Then, yet another strange timeout was >>> invented before HB REQ. >>>=20 >>> Test series show this can spend more than 4 seconds, average value >>> is about 3 seconds. Two 20-times run summary times are 58 to 63 >>> seconds, so, I've got 2.9...3.15 average connect time. >>>=20 >>> Neither Linux nor 32-bit FreeBSD shows this. >> FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64? >=20 > Yes. A fresh dump has reproduced this. OK. Fine. This might an issue in the address handling... I'll try to reproduce this, >=20 >>> It's definitely better than delay each run, as on other platforms >>> (but the initial delay annoys roughly). >> Without SCTP_NODELAY bundling can happen or not, it depends on = timing. >> It would be great, if you can provide a .pcap file for a transfer you >> think shows some buggy behaviour. Then we can figure out what is = going on. >=20 >> MSG_EOR is nothing you provide at a send() call. The flag is only >> returned by the recvmsg() call. >=20 > Yes, I know. This has remained from the code which exposes > SOCK_SEQPACKET specifics over different transport families (e.g. > FreeBSD keeps this flag over AF_UNIX but Linux doesn't). I didn't take > it into account, but, if is needed for sight clarity, I'll remove it:) >=20 >>> } >> OK. Here is what I would expect on the wire: >>=20 >> Without SCTP_NODELAY: >>=20 >>> INIT >> < INIT_ACK >>> COOKIE_ECHO >> < COOKIE_ACK >> < DATA(abc) >>> SACK >> < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr) >>> SACK >>> SHUTDOWN >> < SHUTDOWN_ACK >>> SHUTDOWN_COMPLETE >>=20 >> There should be no substantial delay between any messages above. >>=20 >> With SCTP_NODELAY >>> INIT >> < INIT_ACK >>> COOKIE_ECHO >> < COOKIE_ACK >> < DATA(abc) >> < DATA(def) >> < DATA(ghi) >> < DATA(mno) >> < DATA(pqr) >>> SHUTDOWN >> < SHUTDOWN_ACK >>> SHUTDOWN_COMPLETE >>=20 >> There will be three SACK somewhere between the DATA chunks depending = on >> the timing. >>=20 >> There should be no substantial delay between any messages above. >>=20 >> I think if you see anything else, there is a bug. So do you see a = different >> behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap = file? >=20 > Sorry, I don't have 9.2/i386 yet. The dump from 9.1 is attached. It I actually don't expect a difference between 32-bit or 64-bit. I guess it might be more related to different address setup or timing. > has no address mess but the event sequence is following: >=20 >> INIT > < INIT_ACK >> COOKIE_ECHO > < COOKIE_ACK > < DATA(abc) >> SACK > < DATA(def) > ... delay 200ms... >> SACK > < DATA(ghi); DATA(jkl); DATA(mno); DATA(pqr) >=20 > Comparing to your description, it has unexplained waiting after > DATA(def) from the server side, and SACK delay from the client side. It is timing related as described in my other mail. Is the SACK received before the send() calls finish or vice versa... >=20 > If you think it's fixed in 9.2, we can postpone this part of > discussion until my upgrade to 9.2. >=20 >> Do you have any special routing setup? >=20 > Just this box (9.1/i386) is trivial, no any routing specifics. > For amd64 boxes, I've sent routing details privately. But it seems > there are also none principally "special" these except multiple > addresses at loopback. >=20 >> Please note, that the first SACK is returned without the 200ms delay. = This is >> required by the RFC and the above trace seems to show that. >>> But, if server shuts its writing side down ("s" in argv[]), this >>> laziness disappears. Again, the logic is too opaque and confusing. >> What do you mean by this? >=20 > At least, removing this delay by shutdown(,SHUT_WR) is unexpected. When you shutdown(,SHUT_WR) we send out pending data without waiting for a SACK, since there will be no more data from the user. This is shown by your attached traces and is intended. So it seems that * the timing is as expected for the data transmission phase * there is an issue with setting up associations when there are specific addresses on loopback. Do you agree? Best regards Michael >=20 >=20 > -netch- >