From owner-freebsd-net@FreeBSD.ORG Sun Oct 4 15:29:03 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39B3C1065676 for ; Sun, 4 Oct 2009 15:29:03 +0000 (UTC) (envelope-from rihad@mail.ru) Received: from mx74.mail.ru (mx74.mail.ru [94.100.176.89]) by mx1.freebsd.org (Postfix) with ESMTP id E951E8FC0C for ; Sun, 4 Oct 2009 15:29:02 +0000 (UTC) Received: from [217.25.27.27] (port=60765 helo=[217.25.27.27]) by mx74.mail.ru with asmtp id 1MuT1A-0001Pp-00; Sun, 04 Oct 2009 19:29:00 +0400 Message-ID: <4AC8BF3B.10601@mail.ru> Date: Sun, 04 Oct 2009 20:28:59 +0500 From: rihad User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706) MIME-Version: 1.0 To: Luigi Rizzo References: <4AC8A76B.3050502@mail.ru> <20091004144909.GA42503@onelab2.iet.unipi.it> <4AC8B6E3.2070101@mail.ru> <20091004151518.GB42877@onelab2.iet.unipi.it> In-Reply-To: <20091004151518.GB42877@onelab2.iet.unipi.it> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam: Not detected X-Mras: Ok Cc: freebsd-net@freebsd.org Subject: Re: dummynet dropping too many packets X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Oct 2009 15:29:03 -0000 Luigi Rizzo wrote: > On Sun, Oct 04, 2009 at 07:53:23PM +0500, rihad wrote: >> Luigi Rizzo wrote: >>> On Sun, Oct 04, 2009 at 06:47:23PM +0500, rihad wrote: >>>> Hi, we have around 500-600 mbit/s traffic flowing through a 7.1R Dell >>>> PowerEdge w/ 2 GigE bce cards. There are currently around 4 thousand ISP >>>> users online limited by dummynet pipes of various speeds. According to >>>> netstat -s output around 500-1000 packets are being dropped every second >>>> (this accounts for wasting around 7-12 mbit/s worth of traffic according >>>> to systat -ifstat): >>> what kind of packets are you seeing as dropped ? >>> please give the output of 'netstat -s output | grep drop' >>> >> The output packets, like I said: >> # netstat -s output | grep drop >> 2 connections closed (including 2 drops) >> 0 embryonic connections dropped >> 2 connections dropped by rexmit timeout >> 0 connections dropped by persist timeout >> 0 Connections (fin_wait_2) dropped because of timeout >> 0 connections dropped by keepalive >> 0 dropped >> 2 dropped due to no socket >> 0 dropped due to full socket buffers >> 0 fragments dropped (dup or out of space) >> 2 fragments dropped after timeout >> 7538 output packets dropped due to no bufs, etc. >> >> The statistics are zeroed every 15 seconds in another window as I'm >> investigating the issue, but the rate is around 500-1000 lost packets >> every second at the current ~530 mbit/s load. >> >>> At those speeds you might be hitting various limits with your >>> config (e.g. 50k nmbclusters is probably way too small for >> I bet it isn't: >> 1967/5009/6976/50000 mbuf clusters in use (current/cache/total/max) >> >>> 4k users -- means you have an average of 10-15 buffers per user; >>> the queue size of 350kbytes = 2.6Mbits means 2.6 seconds of buffering, >>> which is quite high besides the fact that in order to scale to 4k users >>> you would need over 1GB of kernel memory just for the buffers). >> Aha. Can you be more specific about the kernel memory stuff? Which >> setting needs tweaking? >> >> >> I have another similar box with 2 em GigE interfaces working @220-230 >> mbit/s and virtually no out-of-bufs dropped packets as with bce @500-600 >> mbit. It too has 350KBytes dummynet queue sizes. And it too has adequate >> mbuf load: >> 3071/10427/13498/25600 mbuf clusters in use (current/cache/total/max) > > I think a quick way to tell if the problem is in dummynet/ipfw or elsewhere > would be to reconfigure the pipes (for short times, e.g. 1-2 minutes > while you test things) as > > # first, try to remove the shaping to see if the drops > # are still present or not > ipfw pipe XX delete; ipfw pipe XX config // no buffering > > # second, do more traffic aggregation to see if the number of > # pipes influences the drops. These are several different > # configs to be tried. > ipfw pipe XX delete; ipfw pipe XX config bw 500Mbits/s > ipfw pipe XX delete; ipfw pipe XX config bw 50Mbits/s mask src-ip 0xffffff00 > ipfw pipe XX delete; ipfw pipe XX config bw 5Mbits/s mask src-ip 0xfffffff0 > > and see if things change. If losses persist even removing dummynet, > then of course it is a device problem. > Also note that dummynet introduces some burstiness in the output, > which might saturate the output queue in the card (no idea what is > used by bce). This particular phenomenon could be reduced by raising > HZ to 2000 or 4000. > Thanks for the tip. although I took an easier route by simply doing "ipfw add allow ip from any to any" before the pipe rules, and the buf drop rate instantly became 0. So the problem is dummynet/ipfw. Should I go to setting HZ to 2000? Mine is 1000. I somehow don't think the change would improve things. Maybe there's another way not involving a machine reboot? This is a production machine ;-(