From owner-freebsd-net@FreeBSD.ORG Thu Nov 25 09:05:45 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08672106566C for ; Thu, 25 Nov 2010 09:05:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id 1895B8FC16 for ; Thu, 25 Nov 2010 09:05:43 +0000 (UTC) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id oAP4shX6007996 for ; Thu, 25 Nov 2010 15:54:46 +1100 Received: from c122-106-145-124.carlnfd1.nsw.optusnet.com.au (c122-106-145-124.carlnfd1.nsw.optusnet.com.au [122.106.145.124]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id oAP4sRMp024817 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 25 Nov 2010 15:54:29 +1100 Date: Thu, 25 Nov 2010 15:54:27 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Claudio Jeker In-Reply-To: <20101123151153.GB27694@diehard.n-r-g.com> Message-ID: <20101125150444.D1713@besplex.bde.org> References: <4CEBBB8F.70400@sentex.net> <20101123151153.GB27694@diehard.n-r-g.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: em driver, 82574L chip, and possibly ASPM X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2010 09:05:45 -0000 On Tue, 23 Nov 2010, Claudio Jeker wrote: > On Tue, Nov 23, 2010 at 02:16:35PM +0100, Ivan Voras wrote: >> >> One other thing, I don't know if this is normal as I've only just >> noticed it: flood-pinging a machine (also a FreeBSD machine, on the >> same switch) and monitoring the packet rates with netstat I see that >> the rates begin at something like 8,000 PPS (in either direction) >> and then slowly over a timespan of 5-10 minutes climb to 100,000 PPS >> (again, in either direction). >> >> Since this is gigabit LAN with a Cisco switch, I'd say the 100,000 >> PPS should be correct. The other machine I'm pinging also has an em >> card but a "desktop class" one. Is this slow-start expected / >> normal? > > Yes, this is how ping -f works. ping -f sends a packet whenever it received > a response or when a timer fired (IIRC that one is set to 1ms). So ping -f > will not ramp up if the delay is smaller then the internal timer and hover > around 1/delay pps until packet loss or bigger delays happen. Yes, this is normal. It is how ping -f doesn't work -- it doesn't do anything resembling flooding, except possibly accidentally when 1 Mbps ethernet was fast. The ramping up is also accidental. The ping -f timeout is 10 msec (actually more, due to timer granularity). That was a lot even when 1Mbps ethernet was fast (since the theoretical max packet rate for 1 MBps ethernet is about 1.5Kpps, and the timeout only gives 100 pps), but when 1 MPbps was fast CPUs were slow so even 100 pps may have been fast for them). Now, 10 msec is a long time, and can even be beaten using the non-flood option "-i 0.001". This gives a timeout of 1 msec (actually more, due to timeout granularity, and considerably more if the timeout granularity is 10 msec as it should be). 1 msec is also a long time, but the -i arg is bogusly limited for non-root to that (such limits give don't even defend agains denial of service attacks, since without other limits anyone that can can execute ping can easily excecute ping -c1 in a shell loop much faster than once per millisecond, excepf of course when 1 Mbps ethernet was fast, and the execs for this give a better local denial of service that ping -i. So, even if the timeout granularity is preposterously smaller than 1 msec, you have to be root to get a flood ping using ping -i. For root, you just need to configure HZ to be about twice as large as the reciprical of the desired flood rate and have enough CPU to handle the timeouts from this (not easy to do for 1 GBps ethernet since this requires HZ to be about 3 million to test the limits). The ramping up occurs due to interaction of various layers of buffering and delays. em is especially predictable since its interrupt moderation is normally configured to give interrupts at 8 KHz. This tends to give an initial "flood" ping packet rate of 8 Kpps. There is initially no flooding at all, but just packets coming back at a rate of 8 Kpps, after each is delayed by precisely 125 usec by the interrupt moderation, as needed to give this rate. But occasionally, due to unrelated network activity affecting the timing, or just due to ping sending an extra packet every 100 msec and this packet happening to be delivered with another packet (which would have a low probablilty without the interrupt moderation, but is likely after just a few seconds or minutes with the interrupt moderation), packets will come back in "bursts". It takes a burst length of just 2 to double the packet rate from 8 Kpps to 16 Kpps. The interrupt moderation delivers 2 packets in these small bursts and ping responds by sending out the next 2 packets in a burst of the same length. These tend to come back (after interrupt moderation) in another burst of the same length. The burst length increases with time until something saturates, unless unrelated network or CPU activity prevents processing of the established bursts fast enough to satisfy the established burst timing,. All this has very little to do with the max packet rate. With one of my bge NICs, saturation occurs at about 100Kpps although the network can do 600Kpps. ping -f works better for determining network latency, but could be improved for that too, e.g., by not sending anything the packet every 100 msec if something came back, and doing other things to prevent bursts). Bruce