Date: Wed, 28 Apr 2010 18:09:18 +0200 From: Renaud Chaput <renchap@gmail.com> To: freebsd-net@freebsd.org Subject: Missing SYN/ACK answers Message-ID: <n2w8c0d35a01004280909l5ab34080pdb10c81fff9b2aa5@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I am using a DL360 G6 server with an additional Intel network card on FreeBSD 8.0-REL-p2 as a loadbalancer. I use nginx as an SSL endpoint, and haproxy as an HTTP loadbalancer. One port of the intel card (em0) is on the internal LAN, and another (em1) on the public LAN. An external monitoring tool reported that sometimes HTTP requests were during 3, 6 or even 9 seconds, in place of the 10-20ms which we usually see. I ran some tests, and I seen that sometimes no SYN/ACK is sent by the loadbalancer, the clients sents another one after 3 seconds, and then a SYN/ACK is sent. Sometime, the client needs to send the SYN 2 or 3 times to have and answer. Here is a tcpdump example : 13:57:52.978784 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 > www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840, options [mss 1460,sackOK,TS val 24878682 ecr 0,nop,wscale 7], length 0 13:57:55.978314 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 > www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840, options [mss 1460,sackOK,TS val 24879432 ecr 0,nop,wscale 7], length 0 13:57:55.978335 IP www-1.reverse.fotolia.net.http > mas91-4-88-189-56-133.fbx.proxad.net.58484: Flags [S.], seq 3988398305, ack 842845758, win 65535, options [mss 1460,nop,wscale 3,sackOK,TS val 2223023194 ecr 24879432], length 0 ... This is an HTTP request done using curl. It seems that i can reproduce it more easily when there is more traffic on the server. When I run ab (Apache Bench) on this server, I see things like this : # ab2 -n 1000 -c 20 http://server/images/flags/zoneFlagSprite.png This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking server (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests Server Software: nginx/0.6.32 Server Hostname: server Server Port: 80 Document Path: /images/flags/zoneFlagSprite.png Document Length: 1979 bytes Concurrency Level: 20 Time taken for tests: 8.252 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 2261000 bytes HTML transferred: 1979000 bytes Requests per second: 121.18 [#/sec] (mean) Time per request: 165.047 [ms] (mean) Time per request: 8.252 [ms] (mean, across all concurrent requests) Transfer rate: 267.56 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 8 90 418.2 26 3027 Processing: 16 74 39.0 62 580 Waiting: 9 43 36.0 31 579 Total: 27 164 416.5 91 3115 Percentage of the requests served within a certain time (ms) 50% 91 66% 105 75% 120 80% 135 90% 171 95% 210 98% 3021 99% 3057 100% 3115 (longest request) All requests are pretty fast, but 2% lasts more than 3s. The result is the same when i request nginx, or when i request an URL handled by haproxy directly. I tried some sysctl tuning, with no visible results : security.bsd.unprivileged_read_msgbuf=0 security.bsd.see_other_uids=0 net.inet.ip.portrange.hilast=59999 net.inet.ip.portrange.hifirst=40000 net.inet.ip.portrange.last=59999 net.inet.ip.portrange.first=40000 net.inet.icmp.icmplim=3000 net.inet.icmp.drop_redirect=1 net.inet.tcp.slowstart_flightsize=4 net.inet.tcp.inflight.enable=1 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.maxdgram=65536 net.inet.udp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.blackhole=2 net.inet.tcp.msl=10000 net.inet.udp.blackhole=1 I also have some packet loss on this server, on the internal LAN. The losses are on the last hop, so not due to network. I dont know if this can be related. I have the same server on another datacenters, with an independant network, and see the same problem. I dont understand how this can be related. I tried with pf disabled, and this does not solve the issue. Any ideas on how to debug and solve this ? Thanks, Renaud Chaput
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?n2w8c0d35a01004280909l5ab34080pdb10c81fff9b2aa5>