From owner-freebsd-net@FreeBSD.ORG  Wed Oct  7 09:22:57 2009
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CEE9B1065701;
	Wed,  7 Oct 2009 09:22:57 +0000 (UTC) (envelope-from rihad@mail.ru)
Received: from mx76.mail.ru (mx76.mail.ru [94.100.176.91])
	by mx1.freebsd.org (Postfix) with ESMTP id 822228FC14;
	Wed,  7 Oct 2009 09:22:57 +0000 (UTC)
Received: from [217.25.27.27] (port=57345 helo=[217.25.27.27])
	by mx76.mail.ru with asmtp 
	id 1MvSjX-000FWG-00; Wed, 07 Oct 2009 13:22:55 +0400
Message-ID: <4ACC5DEC.1010006@mail.ru>
Date: Wed, 07 Oct 2009 14:22:52 +0500
From: rihad <rihad@mail.ru>
User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706)
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
References: <4AC9E29B.6080908@mail.ru>
	<20091005123230.GA64167@onelab2.iet.unipi.it>
	<4AC9EFDF.4080302@mail.ru> <4ACA2CC6.70201@elischer.org>
	<4ACAFF2A.1000206@mail.ru> <4ACB0C22.4000008@mail.ru>
	<20091006100726.GA26426@svzserv.kemerovo.su>
	<4ACB42D2.2070909@mail.ru>
	<20091006142152.GA42350@svzserv.kemerovo.su>
	<4ACB6223.1000709@mail.ru>
	<20091006161240.GA49940@svzserv.kemerovo.su>
	<alpine.BSF.2.00.0910061804340.50283@fledge.watson.org>
	<4ACC5563.602@mail.ru> <4ACC56A6.1030808@mail.ru>
	<alpine.BSF.2.00.0910070957430.58146@fledge.watson.org>
In-Reply-To: <alpine.BSF.2.00.0910070957430.58146@fledge.watson.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam: Not detected
X-Mras: Ok
Cc: freebsd-net@freebsd.org, Eugene Grosbein <eugen@kuzbass.ru>,
	Luigi Rizzo <rizzo@iet.unipi.it>, Julian Elischer <julian@elischer.org>
Subject: Re: dummynet dropping too many packets
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Oct 2009 09:22:57 -0000

Robert Watson wrote:
> On Wed, 7 Oct 2009, rihad wrote:
> 
>> rihad wrote:
>>> I've yet to test how this direct=0 improves extensive dummynet drops.
>>
>> Ooops... After a couple of minutes, suddenly:
>>
>> net.inet.ip.intr_queue_drops: 1284
>>
>> Bumped it up a bit.
> 
> Yes, I was going to suggest that moving to deferred dispatch has 
> probably simply moved the drops to a new spot, the queue between the 
> ithreads and the netisr thread.  In your setup, how many network 
> interfaces are in use, and what drivers?
> 
bce -- Broadcom NetXtreme II (BCM5706/BCM5708) PCI/PCIe Gigabit Ethernet
      adapter driver
device bce compiled into a 7.1-RELEASE-p8 kernel.
2 network cards: bce0 used for ~400-500 mbit/s input, bce1 for output, 
i.e. acting as a smart router. It has 2 quad core CPUs.

Now the probability of drops (as monitored by netstat -s's "output 
packets dropped due to no bufs, etc.") is definitely a function of 
traffic load and the number of items in a ipfw table. I've just 
decreased the size of the two tables from ~2600 to ~1800 each and the 
drops instantly went away, even though the traffic passing through the 
box didn't decrease, it even increased a bit due to now shaping fewer 
clients (luckily "ipfw pipe tablearg" passes packets failing a table 
lookup untouched).

> If what's happening is that you're maxing out a CPU then moving to 
> multiple netisrs might help if your card supports generating flow IDs, 
> but most lower-end cards don't.  I have patches to generate those flow 
> IDs in software rather than hardware, but there are some downsides to 
> doing so, not least that it takes cache line misses on the packet that 
> generally make up a lot of the cost of processing the packet.
> 
> My experience with most reasonable cards is that letting them doing the 
> work distribution with RSS and use multiple ithreads is a more 
> performant strategy than using software work distribution on current 
> systems, though.
> 
So should we prefer a bunch of expensive quality 10 gig cards? Any you 
would recommend?

> Someone has probably asked for this already, but -- could you send a 
> snapshot of the top -SH output in the steady state?  Let top run for a 
> few minutes and then copy/paste the first 10-20 lines into an e-mail.
> 
Sure. Mind you: now there's only 1800 entries in each of the two ipfw 
tables, so any drops have stopped. But it only takes another 200-300 
entries to start dropping.

155 processes: 10 running, 129 sleeping, 16 waiting
CPU:  2.4% user,  0.0% nice,  2.0% system,  9.3% interrupt, 86.2% idle
Mem: 1691M Active, 1491M Inact, 454M Wired, 130M Cache, 214M Buf, 170M Free
Swap: 2048M Total, 12K Used, 2048M Free

   PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
    15 root       171 ki31     0K    16K CPU3   3  22.4H 97.85% idle: cpu3
    14 root       171 ki31     0K    16K CPU4   4  23.0H 96.29% idle: cpu4
    12 root       171 ki31     0K    16K CPU6   6  23.8H 94.58% idle: cpu6
    16 root       171 ki31     0K    16K CPU2   2  22.5H 90.72% idle: cpu2
    13 root       171 ki31     0K    16K CPU5   5  23.4H 90.58% idle: cpu5
    18 root       171 ki31     0K    16K RUN    0  20.3H 85.60% idle: cpu0
    17 root       171 ki31     0K    16K CPU1   1 910:03 78.37% idle: cpu1
    11 root       171 ki31     0K    16K CPU7   7  23.8H 65.62% idle: cpu7
    21 root       -44    -     0K    16K CPU7   7  19:03 48.34% swi1: net
    29 root       -68    -     0K    16K WAIT   1 515:49 19.63% irq256: bce0
    31 root       -68    -     0K    16K WAIT   2  56:05  5.52% irq257: bce1
    19 root       -32    -     0K    16K WAIT   5  50:05  3.86% swi4: 
clock sio
   983 flowtools   44    0 12112K  6440K select 0  13:20  0.15% flow-capture
   465 root       -68    -     0K    16K -      3  51:19  0.00% dummynet
     3 root        -8    -     0K    16K -      1   7:41  0.00% g_up
     4 root        -8    -     0K    16K -      2   7:14  0.00% g_down
    30 root       -64    -     0K    16K WAIT   6   5:30  0.00% irq16: mfi0