From owner-freebsd-net@FreeBSD.ORG Wed Apr 28 16:31:40 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D6FD1065672 for ; Wed, 28 Apr 2010 16:31:40 +0000 (UTC) (envelope-from renchap@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.153]) by mx1.freebsd.org (Postfix) with ESMTP id 28F8F8FC1B for ; Wed, 28 Apr 2010 16:31:39 +0000 (UTC) Received: by fg-out-1718.google.com with SMTP id 22so2829357fge.13 for ; Wed, 28 Apr 2010 09:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=j05WtZj2Gker+DybjlxLwInnC9OrzdnjFoi8jahFsHk=; b=pjPKP+AUE28hRs0TnMUlGPCKd+bArHYX9YJWu2/wMWp4baHNM7f3en81JpuuIL0klR 7dzmzpF2VW+/bbEQ9oiwVAkfThmXhmidK1Zun1scyFKfaqoaTEj/h4GcDU3fDgZ0+qEe Lqh8WBnl8o0p5ULGwmxDE2jSw4cUERML4C9+M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=iSMVw4EPik0v5vvWKaMw3Strxe3X9QnHGvXatchVbProP571LBYWalKjSJmxWfYB3t 3tCU7NZCAcEc9HUFYNUPPitP9PvEtPCWJoijyZEmDeQBO8D+m4SWZevBT+q31QMCB51h VxFRk/CDc/ePWBAo1L+0UrrkfF7OtEunQpMX0= MIME-Version: 1.0 Received: by 10.239.187.72 with SMTP id k8mr751965hbh.47.1272470958710; Wed, 28 Apr 2010 09:09:18 -0700 (PDT) Received: by 10.239.166.68 with HTTP; Wed, 28 Apr 2010 09:09:18 -0700 (PDT) Date: Wed, 28 Apr 2010 18:09:18 +0200 Message-ID: From: Renaud Chaput To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Missing SYN/ACK answers X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Apr 2010 16:31:40 -0000 Hi, I am using a DL360 G6 server with an additional Intel network card on FreeBSD 8.0-REL-p2 as a loadbalancer. I use nginx as an SSL endpoint, and haproxy as an HTTP loadbalancer. One port of the intel card (em0) is on the internal LAN, and another (em1) on the public LAN. An external monitoring tool reported that sometimes HTTP requests were during 3, 6 or even 9 seconds, in place of the 10-20ms which we usually see. I ran some tests, and I seen that sometimes no SYN/ACK is sent by the loadbalancer, the clients sents another one after 3 seconds, and then a SYN/ACK is sent. Sometime, the client needs to send the SYN 2 or 3 times to have and answer. Here is a tcpdump example : 13:57:52.978784 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 > www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840, options [mss 1460,sackOK,TS val 24878682 ecr 0,nop,wscale 7], length 0 13:57:55.978314 IP mas91-4-88-189-56-133.fbx.proxad.net.58484 > www-1.reverse.fotolia.net.http: Flags [S], seq 842845757, win 5840, options [mss 1460,sackOK,TS val 24879432 ecr 0,nop,wscale 7], length 0 13:57:55.978335 IP www-1.reverse.fotolia.net.http > mas91-4-88-189-56-133.fbx.proxad.net.58484: Flags [S.], seq 3988398305, ack 842845758, win 65535, options [mss 1460,nop,wscale 3,sackOK,TS val 2223023194 ecr 24879432], length 0 ... This is an HTTP request done using curl. It seems that i can reproduce it more easily when there is more traffic on the server. When I run ab (Apache Bench) on this server, I see things like this : # ab2 -n 1000 -c 20 http://server/images/flags/zoneFlagSprite.png This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking server (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests Server Software: nginx/0.6.32 Server Hostname: server Server Port: 80 Document Path: /images/flags/zoneFlagSprite.png Document Length: 1979 bytes Concurrency Level: 20 Time taken for tests: 8.252 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 2261000 bytes HTML transferred: 1979000 bytes Requests per second: 121.18 [#/sec] (mean) Time per request: 165.047 [ms] (mean) Time per request: 8.252 [ms] (mean, across all concurrent requests) Transfer rate: 267.56 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 8 90 418.2 26 3027 Processing: 16 74 39.0 62 580 Waiting: 9 43 36.0 31 579 Total: 27 164 416.5 91 3115 Percentage of the requests served within a certain time (ms) 50% 91 66% 105 75% 120 80% 135 90% 171 95% 210 98% 3021 99% 3057 100% 3115 (longest request) All requests are pretty fast, but 2% lasts more than 3s. The result is the same when i request nginx, or when i request an URL handled by haproxy directly. I tried some sysctl tuning, with no visible results : security.bsd.unprivileged_read_msgbuf=0 security.bsd.see_other_uids=0 net.inet.ip.portrange.hilast=59999 net.inet.ip.portrange.hifirst=40000 net.inet.ip.portrange.last=59999 net.inet.ip.portrange.first=40000 net.inet.icmp.icmplim=3000 net.inet.icmp.drop_redirect=1 net.inet.tcp.slowstart_flightsize=4 net.inet.tcp.inflight.enable=1 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.maxdgram=65536 net.inet.udp.recvspace=65536 net.inet.tcp.rfc1323=1 net.inet.tcp.blackhole=2 net.inet.tcp.msl=10000 net.inet.udp.blackhole=1 I also have some packet loss on this server, on the internal LAN. The losses are on the last hop, so not due to network. I dont know if this can be related. I have the same server on another datacenters, with an independant network, and see the same problem. I dont understand how this can be related. I tried with pf disabled, and this does not solve the issue. Any ideas on how to debug and solve this ? Thanks, Renaud Chaput