From owner-freebsd-net@FreeBSD.ORG  Thu Nov 25 09:05:45 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08672106566C
	for <freebsd-net@freebsd.org>; Thu, 25 Nov 2010 09:05:45 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx06.syd.optusnet.com.au
	(fallbackmx06.syd.optusnet.com.au [211.29.132.8])
	by mx1.freebsd.org (Postfix) with ESMTP id 1895B8FC16
	for <freebsd-net@freebsd.org>; Thu, 25 Nov 2010 09:05:43 +0000 (UTC)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	oAP4shX6007996
	for <freebsd-net@freebsd.org>; Thu, 25 Nov 2010 15:54:46 +1100
Received: from c122-106-145-124.carlnfd1.nsw.optusnet.com.au
	(c122-106-145-124.carlnfd1.nsw.optusnet.com.au [122.106.145.124])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	oAP4sRMp024817
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 25 Nov 2010 15:54:29 +1100
Date: Thu, 25 Nov 2010 15:54:27 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Claudio Jeker <cjeker@diehard.n-r-g.com>
In-Reply-To: <20101123151153.GB27694@diehard.n-r-g.com>
Message-ID: <20101125150444.D1713@besplex.bde.org>
References: <icgd44$89l$1@dough.gmane.org> <4CEBBB8F.70400@sentex.net>
	<icgerb$gnj$1@dough.gmane.org>
	<20101123151153.GB27694@diehard.n-r-g.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org
Subject: Re: em driver, 82574L chip, and possibly ASPM
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Nov 2010 09:05:45 -0000

On Tue, 23 Nov 2010, Claudio Jeker wrote:

> On Tue, Nov 23, 2010 at 02:16:35PM +0100, Ivan Voras wrote:
>>
>> One other thing, I don't know if this is normal as I've only just
>> noticed it: flood-pinging a machine (also a FreeBSD machine, on the
>> same switch) and monitoring the packet rates with netstat I see that
>> the rates begin at something like 8,000 PPS (in either direction)
>> and then slowly over a timespan of 5-10 minutes climb to 100,000 PPS
>> (again, in either direction).
>>
>> Since this is gigabit LAN with a Cisco switch, I'd say the 100,000
>> PPS should be correct. The other machine I'm pinging also has an em
>> card but a "desktop class" one. Is this slow-start expected /
>> normal?
>
> Yes, this is how ping -f works. ping -f sends a packet whenever it received
> a response or when a timer fired (IIRC that one is set to 1ms). So ping -f
> will not ramp up if the delay is smaller then the internal timer and hover
> around 1/delay pps until packet loss or bigger delays happen.

Yes, this is normal.  It is how ping -f doesn't work -- it doesn't do anything
resembling flooding, except possibly accidentally when 1 Mbps ethernet was
fast.  The ramping up is also accidental.

The ping -f timeout is 10 msec (actually more, due to timer granularity).
That was a lot even when 1Mbps ethernet was fast (since the theoretical
max packet rate for 1 MBps ethernet is about 1.5Kpps, and the timeout
only gives 100 pps), but when 1 MPbps was fast CPUs were slow so even
100 pps may have been fast for them).

Now, 10 msec is a long time, and can even be beaten using the non-flood
option "-i 0.001".  This gives a timeout of 1 msec (actually more, due
to timeout granularity, and considerably more if the timeout granularity
is 10 msec as it should be).  1 msec is also a long time, but the -i
arg is bogusly limited for non-root to that (such limits give don't
even defend agains denial of service attacks, since without other
limits anyone that can can execute ping can easily excecute ping -c1
in a shell loop much faster than once per millisecond, excepf of course
when 1 Mbps ethernet was fast, and the execs for this give a better
local denial of service that ping -i.  So, even if the timeout granularity
is preposterously smaller than 1 msec, you have to be root to get a
flood ping using ping -i.  For root, you just need to configure HZ to
be about twice as large as the reciprical of the desired flood rate
and have enough CPU to handle the timeouts from this (not easy to do
for 1 GBps ethernet since this requires HZ to be about 3 million to test
the limits).

The ramping up occurs due to interaction of various layers of buffering
and delays.  em is especially predictable since its interrupt moderation
is normally configured to give interrupts at 8 KHz.  This tends to
give an initial "flood" ping packet rate of 8 Kpps.  There is initially
no flooding at all, but just packets coming back at a rate of 8 Kpps,
after each is delayed by precisely 125 usec by the interrupt moderation,
as needed to give this rate.  But occasionally, due to unrelated network
activity affecting the timing, or just due to ping sending an extra
packet every 100 msec and this packet happening to be delivered with
another packet (which would have a low probablilty without the interrupt
moderation, but is likely after just a few seconds or minutes with the
interrupt moderation), packets will come back in "bursts".  It takes
a burst length of just 2 to double the packet rate from 8 Kpps to 16
Kpps.  The interrupt moderation delivers 2 packets in these small
bursts and ping responds by sending out the next 2 packets in a burst
of the same length.  These tend to come back (after interrupt moderation)
in another burst of the same length.  The burst length increases with
time until something saturates, unless unrelated network or CPU activity
prevents processing of the established bursts fast enough to satisfy the
established burst timing,.

All this has very little to do with the max packet rate.  With one of
my bge NICs, saturation occurs at about 100Kpps although the network
can do 600Kpps.  ping -f works better for determining network latency,
but could be improved for that too, e.g., by not sending anything the
packet every 100 msec if something came back, and doing other things
to prevent bursts).

Bruce