Date: Thu, 5 Dec 2013 10:41:42 +0200 From: Valentin Nechayev <netch@netch.kiev.ua> To: freebsd-net@freebsd.org Subject: SCTP huge connect delays (at amd64) and yet another question Message-ID: <20131205084142.GA31113@netch.kiev.ua>
next in thread | raw e-mail | index | archive | help
Hi, I've got some test results which are surprising and I would get a clarification. A simple connection is created between two one-to-one SCTP sockets (AF_INET, SOCK_STREAM, IPPROTO_SCTP) at loopback (127.0.0.1). The server side sends 6 3-byte messages to client side and optionally designates writing shutdown. Client receives all them and measures a time before each receiving. Code is showed at the end of this message. Tested systems are: * FreeBSD 9.2-release/amd64 * FreeBSD 9.1-release/amd64 * FreeBSD 9.1-release/i386 * Linux OpenSuSE 12.2, kernel 3.4.63-2.44-default, x86_64 * Linux RHEL 6.3, kernel 2.6.32-279.22.1.38.0.el6.x86_64 The first discrepancy found is specific for FreeBSD on amd64 and not for i386 version; it's that connection setup lasts 2-4 seconds (!!) Tcpdump shows indication that could be parsed as message miss: tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt es 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT P (132), length 188, bad cksum 0 (->f274)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [INIT] [init tag: 3943463987] [rwnd: 1864135] [OS: 10] [MIS: 2048] [i nit TSN: 3475830004] 08:18:34.639450 IP (tos 0x0, ttl 64, id 42621, offset 0, flags [none], proto SCT P (132), length 524, bad cksum 0 (->48ee)!) 127.0.0.1.2500 > 10.0.0.2.50025: sctp 1) [INIT ACK] [init tag: 59811639] [rwnd: 1864135] [OS: 10] [MIS: 2048] [init TSN: 466863335] 08:18:34.639467 IP (tos 0x0, ttl 64, id 52783, offset 0, flags [none], proto SCT P (132), length 424, bad cksum 0 (->21a0)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [COOKIE ECHO] 08:18:35.639618 IP (tos 0x0, ttl 64, id 12109, offset 0, flags [DF], proto SCTP (132), length 424, bad cksum 0 (->8082)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [COOKIE ECHO] 08:18:36.692628 IP (tos 0x0, ttl 64, id 48682, offset 0, flags [DF], proto SCTP (132), length 76, bad cksum 0 (->7e01)!) 127.0.0.1.2500 > 127.0.0.1.50025: sctp 1) [HB REQ] 08:18:36.692668 IP (tos 0x0, ttl 64, id 10809, offset 0, flags [DF], proto SCTP (132), length 76, bad cksum 0 (->86f2)!) 10.0.0.2.50025 > 127.0.0.1.2500: sctp 1) [HB ACK] 08:18:36.692707 IP (tos 0x2,ECT(0), ttl 64, id 16588, offset 0, flags [DF], proto SCTP (132), length 52, bad cksum 0 (->fb75)!) 127.0.0.1.2500 > 127.0.0.1.50025: sctp 1) [DATA] (B)(E) [TSN: 466863335] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload: 0x0000: 6162 63 abc [...] At 08:18:34.639467, cookie echo was sent but likely ignored. One second later it was resent. Then, yet another strange timeout was invented before HB REQ. Test series show this can spend more than 4 seconds, average value is about 3 seconds. Two 20-times run summary times are 58 to 63 seconds, so, I've got 2.9...3.15 average connect time. Neither Linux nor 32-bit FreeBSD shows this. The second discrepancy is well known case of so-called "Nagle" algorithm adapted for SCTP but details are confusing. If SCTP_NODELAY isn't turned on on server side, tcpdump shows that the second packet is sent from sender side without delay, but receiver's SACK is delayed for 200 ms by default. These results are identical for FreeBSD (32 bit) and Linux, but not amd64 FreeBSD (see below). But why? A common sense suggests that, if client receives all immediately, and server has already prepared its data, no additional delay shall be invented. In analogue to TCP, I would expect that, until acknoledge for "abc" is got, "def" isn't sent, but then the latter is sent immediately. 09:28:11.374335 IP (tos 0x2,ECT(0), ttl 64, id 24204, offset 0, flags [DF], prot o SCTP (132), length 52, bad cksum 0 (->ddb5)!) 127.0.0.1.2500 > 127.0.0.1.41007: sctp 1) [DATA] (B)(E) [TSN: 183313025] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload: 0x0000: 6162 63 abc 09:28:11.374349 IP (tos 0x0, ttl 64, id 522, offset 0, flags [none], proto SCTP (132), length 48, bad cksum 0 (->7a3e)!) 127.0.0.1.41007 > 127.0.0.1.2500: sctp 1) [SACK] [cum ack 183313025] [a_rwnd 1863876] [#gap acks 0] [#dup tsns 0] 09:28:11.374368 IP (tos 0x2,ECT(0), ttl 64, id 64629, offset 0, flags [DF], prot o SCTP (132), length 52, bad cksum 0 (->3fcc)!) 127.0.0.1.2500 > 127.0.0.1.41007: sctp 1) [DATA] (B)(E) [TSN: 183313026] [SID: 0] [SSEQ 1] [PPID 0x0] [Payload: 0x0000: 6465 66 def 09:28:11.573780 IP (tos 0x0, ttl 64, id 12179, offset 0, flags [none], proto SCT P (132), length 48, bad cksum 0 (->4cb5)!) 127.0.0.1.41007 > 127.0.0.1.2500: sctp 1) [SACK] [cum ack 183313026] [a_rwnd 1864135] [#gap acks 0] [#dup tsns 0] But, if server shuts its writing side down ("s" in argv[]), this laziness disappears. Again, the logic is too opaque and confusing. 64-bit (amd64) FreeBSD shows another behavior (both 9.1 and 9.2): in addition to setup delay (see above), the delay between 2nd and 3rd received packet (case SCTP_NODELAY isn't activated) could be longer than minimally needed one and spreads between a few hundreds of microseconds up to full 0.2 second delay shown on other platforms. In average, 1/8 of runs show this delay: $ fgrep ghi ll | sort -rn -k2,2 -t= | uniq -c 1 got: ghi (with MSG_EOR) tdiff=200835 1 got: ghi (with MSG_EOR) tdiff=200829 1 got: ghi (with MSG_EOR) tdiff=200826 1 got: ghi (with MSG_EOR) tdiff=200822 1 got: ghi (with MSG_EOR) tdiff=200819 1 got: ghi (with MSG_EOR) tdiff=200800 1 got: ghi (with MSG_EOR) tdiff=200792 1 got: ghi (with MSG_EOR) tdiff=199885 1 got: ghi (with MSG_EOR) tdiff=163816 1 got: ghi (with MSG_EOR) tdiff=55849 1 got: ghi (with MSG_EOR) tdiff=1825 21 got: ghi (with MSG_EOR) tdiff=2 38 got: ghi (with MSG_EOR) tdiff=1 It's definitely better than delay each run, as on other platforms (but the initial delay annoys roughly). The testing code: === #include <sys/types.h> #include <sys/socket.h> #include <sys/poll.h> #include <sys/time.h> #include <netinet/in.h> #include <netinet/sctp.h> #include <arpa/inet.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <string.h> #include <err.h> #define PORT 2500 int main(int argc, char *argv[]) { int s_li, s_ac, s_cl; struct sockaddr_in sia; struct iovec iov[1]; struct msghdr msg; socklen_t slen; struct timeval tv0, tv1; int tdiff; int i; s_li = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); if (s_li < 0) err(1, "socket"); memset(&sia, 0, sizeof(sia)); sia.sin_family = AF_INET; sia.sin_addr.s_addr = htonl(0x7F000001); sia.sin_port = htons(PORT); if (bind(s_li, (struct sockaddr*)&sia, sizeof(sia)) < 0) err(1, "bind"); if (listen(s_li, 1) < 0) err(1, "listen"); s_cl = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP); if (s_cl < 0) err(1, "socket"); if (connect(s_cl, (struct sockaddr*)&sia, sizeof(sia)) < 0) err(1, "connect"); slen = sizeof(sia); s_ac = accept(s_li, (struct sockaddr*) &sia, &slen); if (s_ac < 0) err(1, "accept"); for (i = 1; i < argc; ++i) { if (!strcmp(argv[i], "nn")) { const int one = 1; if (setsockopt(s_ac, IPPROTO_SCTP, SCTP_NODELAY, &one, sizeof(one)) < 0) warn("setsockopt(SCTP_NODELAY)"); } } if (send(s_ac, "abc", 3, 0) != 3) err(1, "send"); if (send(s_ac, "def", 3, MSG_EOR) != 3) err(1, "send"); if (send(s_ac, "ghi", 3, 0) != 3) err(1, "send"); if (send(s_ac, "jkl", 3, MSG_EOR) != 3) err(1, "send"); if (send(s_ac, "mno", 3, 0) != 3) err(1, "send"); if (send(s_ac, "pqr", 3, MSG_EOR) != 3) err(1, "send"); for (i = 1; i < argc; ++i) { if (!strcmp(argv[i], "s")) shutdown(s_ac, SHUT_WR); } for(;;) { char buf[1024]; memset(&msg, 0, sizeof(msg)); iov[0].iov_base = buf; iov[0].iov_len = sizeof(buf) - 1; msg.msg_iov = iov; msg.msg_iovlen = 1; gettimeofday(&tv0, NULL); ssize_t got = recvmsg(s_cl, &msg, 0); gettimeofday(&tv1, NULL); tdiff = (int)tv1.tv_usec - (int)tv0.tv_usec; if (tdiff < 0) tdiff += 1000000; if (got == 0) break; if (got == -1) { perror("recvmsg"); break; } buf[got] = 0; printf("got: %s (%s MSG_EOR) tdiff=%d\n", buf, (msg.msg_flags & MSG_EOR) ? "with" : "without", tdiff); if (!strncmp(buf, "pqr", 3)) break; } return 0; } // vim:ts=2:sts=2:sw=2:et:si: === -netch-
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131205084142.GA31113>