From owner-freebsd-net@FreeBSD.ORG  Wed Oct  7 12:21:44 2009
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 501DD106568F
	for <freebsd-net@freebsd.org>; Wed,  7 Oct 2009 12:21:44 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 10B5B8FC17
	for <freebsd-net@freebsd.org>; Wed,  7 Oct 2009 12:21:44 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 7620446B03;
	Wed,  7 Oct 2009 08:21:43 -0400 (EDT)
Date: Wed, 7 Oct 2009 13:21:43 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: rihad <rihad@mail.ru>
In-Reply-To: <4ACC65A0.7030900@mail.ru>
Message-ID: <alpine.BSF.2.00.0910071312420.58146@fledge.watson.org>
References: <4AC9E29B.6080908@mail.ru>
	<20091005123230.GA64167@onelab2.iet.unipi.it>
	<4AC9EFDF.4080302@mail.ru> <4ACA2CC6.70201@elischer.org>
	<4ACAFF2A.1000206@mail.ru> <4ACB0C22.4000008@mail.ru>
	<20091006100726.GA26426@svzserv.kemerovo.su>
	<4ACB42D2.2070909@mail.ru>
	<20091006142152.GA42350@svzserv.kemerovo.su> <4ACB6223.1000709@mail.ru>
	<20091006161240.GA49940@svzserv.kemerovo.su>
	<alpine.BSF.2.00.0910061804340.50283@fledge.watson.org>
	<4ACC5563.602@mail.ru> <4ACC56A6.1030808@mail.ru>
	<alpine.BSF.2.00.0910070957430.58146@fledge.watson.org>
	<4ACC5DEC.1010006@mail.ru>
	<alpine.BSF.2.00.0910071036280.58146@fledge.watson.org>
	<4ACC65A0.7030900@mail.ru>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org, Eugene Grosbein <eugen@kuzbass.ru>,
	Luigi Rizzo <rizzo@iet.unipi.it>, Julian Elischer <julian@elischer.org>
Subject: Re: dummynet dropping too many packets
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Oct 2009 12:21:44 -0000


On Wed, 7 Oct 2009, rihad wrote:

> Robert Watson wrote:
>> 
>> On Wed, 7 Oct 2009, rihad wrote:
>> 
>>>> snapshot of the top -SH output in the steady state?  Let top run for a 
>>>> few minutes and then copy/paste the first 10-20 lines into an e-mail.
>>>> 
>>> Sure. Mind you: now there's only 1800 entries in each of the two ipfw 
>>> tables, so any drops have stopped. But it only takes another 200-300 
>>> entries to start dropping.
>> 
>> Could you do the same in the net.isr.direct=1 configuration so we can 
>> compare?
>
> net.isr.direct=1:

So it seems that CPU exhaustion is likely not the source of drops -- what I 
was looking for in both configurations were signs that any individual thread 
was approaching 80% utilization, which in a peak load situation might mean it 
hitting 100% and therefore leading to packet loss for that reason.

The statistic you're monitoring has a couple of interpretations, but the most 
likely interpretation is overfilling the output queue on the network interface 
you're transmitting on.  In turn there are various possible reasons for this 
happening, but the two most common would be:

(1) Average load is exceeding the transmit capacity of the driver/hardware
     pipeline -- the pipe is just too small.

(2) Peak capacity (burstiness) is exceeding the transmit capacity of the
     driver/hardware pipeline.

The questions that Luigi and others have been asking about your dummynet 
configuration are to some extent oriented around determining whether the 
burstiness introduced by dummynet could be responsible for that.  Suggestions 
like increasing timer resolution are intended to spread out the injection of 
packets by dummynet to attempt to reduce the peaks of burstiness that occur 
when multiple queues inject packets in a burst that exceeds the queue depth 
supported by combined hardware descriptor rings and software transmit queue.

The two solutions, then are (a) to increase the timer resolution significantly 
so that packets are injected in smaller bursts and (b) increase the queue 
capacities.  The hardware queue limits likely can't be raised w/o new 
hardware, but the ifnet transmit queue sizes can be increased.  Timer 
resolution going up is almost certainly not a bad idea in your configuration, 
although does require a reboot as you have observed.

On a side note: one other possible interpretation of that statistic is that 
you're seeing fragmentation problems.  Usually in forwarding scenarios this is 
unlikely.  However, it wouldn't hurt to make sure you have LRO turned off on 
the network interfaces you're using, assuming it's supported by the driver.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> last pid: 92152;  load averages:  0.99,  1.18,  1.15 
> up 1+01:42:28  14:53:09
> 162 processes: 9 running, 136 sleeping, 17 waiting
> CPU:  2.1% user,  0.0% nice,  5.4% system,  7.0% interrupt, 85.5% idle
> Mem: 1693M Active, 1429M Inact, 447M Wired, 197M Cache, 214M Buf, 170M Free
> Swap: 2048M Total, 12K Used, 2048M Free
>
>  PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>   12 root       171 ki31     0K    16K CPU6   6  24.3H 100.00% idle: cpu6
>   13 root       171 ki31     0K    16K CPU5   5  23.8H 95.95% idle: cpu5
>   14 root       171 ki31     0K    16K CPU4   4  23.4H 93.12% idle: cpu4
>   16 root       171 ki31     0K    16K CPU2   2  23.0H 90.19% idle: cpu2
>   11 root       171 ki31     0K    16K CPU7   7  24.2H 87.26% idle: cpu7
>   15 root       171 ki31     0K    16K CPU3   3  22.8H 86.18% idle: cpu3
>   18 root       171 ki31     0K    16K RUN    0  20.6H 84.96% idle: cpu0
>   17 root       171 ki31     0K    16K CPU1   1 933:23 47.85% idle: cpu1
>   29 root       -68    -     0K    16K WAIT   1 522:02 46.88% irq256: bce0
>  465 root       -68    -     0K    16K -      7  55:15 12.65% dummynet
>   31 root       -68    -     0K    16K WAIT   2  57:29  4.74% irq257: bce1
>   21 root       -44    -     0K    16K WAIT   0  34:55  4.64% swi1: net
>   19 root       -32    -     0K    16K WAIT   4  51:41  3.96% swi4: clock 
> sio
>   30 root       -64    -     0K    16K WAIT   6   5:43  0.73% irq16: mfi0
>
>
> Almost 2000 entries in the table, traffic load= 420-430 mbps, drops haven't 
> yet started.
>
> Previous net.isr.direct=0:
>> 
>>> 
>>> 155 processes: 10 running, 129 sleeping, 16 waiting
>>> CPU:  2.4% user,  0.0% nice,  2.0% system,  9.3% interrupt, 86.2% idle
>>> Mem: 1691M Active, 1491M Inact, 454M Wired, 130M Cache, 214M Buf, 170M 
>>> Free
>>> Swap: 2048M Total, 12K Used, 2048M Free
>>>
>>>  PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>>   15 root       171 ki31     0K    16K CPU3   3  22.4H 97.85% idle: cpu3
>>>   14 root       171 ki31     0K    16K CPU4   4  23.0H 96.29% idle: cpu4
>>>   12 root       171 ki31     0K    16K CPU6   6  23.8H 94.58% idle: cpu6
>>>   16 root       171 ki31     0K    16K CPU2   2  22.5H 90.72% idle: cpu2
>>>   13 root       171 ki31     0K    16K CPU5   5  23.4H 90.58% idle: cpu5
>>>   18 root       171 ki31     0K    16K RUN    0  20.3H 85.60% idle: cpu0
>>>   17 root       171 ki31     0K    16K CPU1   1 910:03 78.37% idle: cpu1
>>>   11 root       171 ki31     0K    16K CPU7   7  23.8H 65.62% idle: cpu7
>>>   21 root       -44    -     0K    16K CPU7   7  19:03 48.34% swi1: net
>>>   29 root       -68    -     0K    16K WAIT   1 515:49 19.63% irq256: bce0
>>>   31 root       -68    -     0K    16K WAIT   2  56:05  5.52% irq257: bce1
>>>   19 root       -32    -     0K    16K WAIT   5  50:05  3.86% swi4: clock 
>>> sio
>>>  983 flowtools   44    0 12112K  6440K select 0  13:20  0.15% flow-capture
>>>  465 root       -68    -     0K    16K -      3  51:19  0.00% dummynet
>>>    3 root        -8    -     0K    16K -      1   7:41  0.00% g_up
>>>    4 root        -8    -     0K    16K -      2   7:14  0.00% g_down
>>>   30 root       -64    -     0K    16K WAIT   6   5:30  0.00% irq16: mfi0
>>> 
>>> 
>> 
>> 
>
>