From owner-freebsd-net@FreeBSD.ORG Tue Oct 6 10:11:07 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 508751065670 for ; Tue, 6 Oct 2009 10:11:07 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 116488FC14 for ; Tue, 6 Oct 2009 10:11:06 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id F0DDF730DA; Tue, 6 Oct 2009 12:17:47 +0200 (CEST) Date: Tue, 6 Oct 2009 12:17:47 +0200 From: Luigi Rizzo To: rihad Message-ID: <20091006101747.GA87655@onelab2.iet.unipi.it> References: <4AC9D87E.7000005@mail.ru> <20091005120418.GA63131@onelab2.iet.unipi.it> <4AC9E29B.6080908@mail.ru> <20091005123230.GA64167@onelab2.iet.unipi.it> <4AC9EFDF.4080302@mail.ru> <4ACA2CC6.70201@elischer.org> <4ACAFF2A.1000206@mail.ru> <4ACB0C22.4000008@mail.ru> <20091006093408.GA86830@onelab2.iet.unipi.it> <4ACB0F28.3000906@mail.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ACB0F28.3000906@mail.ru> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, Julian Elischer Subject: Re: dummynet dropping too many packets X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Oct 2009 10:11:07 -0000 On Tue, Oct 06, 2009 at 02:34:32PM +0500, rihad wrote: > Luigi Rizzo wrote: > >On Tue, Oct 06, 2009 at 02:21:38PM +0500, rihad wrote: > >>rihad wrote: > >>>Julian Elischer wrote: > >>>>rihad wrote: > >>>>>Luigi Rizzo wrote: > >>>>>>2. your test with 'ipfw allow ip from any to any' does not > >>>>>> prove that the interface queue is not saturating, because > >>>>>> you also remove the burstiness that dummynet introduces, > >>>>>> and so the queue is driven differently. > >>>>>> > >>>>>How do I investigate and fix this burstiness issue? > >>>>higher Hz rate? > >>>> > >>>Rebooted with HZ=2000 10 minutes ago. Due to application design the ipfw > >>>table (pipe tablearg) was flushed, so there are now 350 (and increasing > >>>at a rate 1 per 1-2 seconds as I type this) or so users in the table, > >>>and not 4k as normally would be. The box is servicing 450+ mbit/s > >>>without a single drop. I want to monitor how things change once the > >>>number of users in ipfw tables gradually increases up to several > >>>thousands. > >>> > >>It starts dropping packets at around 2000 online users (ipfw table > >>load). I've set up a shell script to monitor this: > > > >once again: > >you should check which pipes are dropping packets and whether > >the number of drops indicated in the pipes matches the counts > >indicated by netstat. > > > It's impossible to do so accurately, since users come and go any moment, > and their pipes expire, and it's plain useless. As to the accordance of > packet drop rate with net.inet.ip.dummynet.io_pkt_drop, they vary wildly: > > 8664 output packets dropped due to no bufs, etc. > net.inet.ip.dummynet.io_pkt_drop: 111 io_pkt_drop only reports packets dropped to errors (missing pipes, randomly forced packet drops which you don't use, no buffers and so on). The packets intentionally dropped in dummynet because queues are full are listed by 'ipfw pipe show'. Even if pipes expire, there is a difference between having partial information and completely ignoring what is available and claiming "it's plain useless". BTW at least while you try to debug the problem you can temporarily disable the pipe expire with 'sysctl net.inet.ip.dummynet.expire=0' and also you could poll the stats more frequently (say every 1-2-5 sec) to get a better idea of what happens. The one time you sent the 'pipe show' info there were clearly a few pipes with thousand packet drops -- as i said those are unavoidable and correspond to clients that systematically exceed their share (500k/1m as you set) e.g. because they are flooding the net with TCP SYN or UDP requests. This may be due to viruses, aggressive p2p, and so on. A single client can easily generate the extra 2000 packets per seconds that you are seeing. It's up to you to open your eyes looking for evidence, or close them and randomly blame one or another piece of the system. cheers luigi > since boottime!