From owner-freebsd-net@FreeBSD.ORG  Fri Dec 28 04:36:47 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8EB5216A420;
	Fri, 28 Dec 2007 04:36:47 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au
	[211.29.132.197])
	by mx1.freebsd.org (Postfix) with ESMTP id 30BB213C459;
	Fri, 28 Dec 2007 04:36:47 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	[211.30.219.213])
	by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lBS4adHL019236
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 28 Dec 2007 15:36:42 +1100
Date: Fri, 28 Dec 2007 15:36:39 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Mark Fullmer <maf@eng.oar.net>
In-Reply-To: <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net>
Message-ID: <20071228143411.C3587@besplex.bde.org>
References: <20071221234347.GS25053@tnn.dglawrence.com>
	<MDEHLPKNGKAHNMBLJOLKMEKLJAAC.davids@webmaster.com>
	<20071222050743.GP57756@deviant.kiev.zoral.com.ua>
	<20071223032944.G48303@delplex.bde.org>
	<985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Kostik Belousov <kostikbel@gmail.com>, freebsd-net@FreeBSD.org,
	freebsd-stable@FreeBSD.org
Subject: Re: Packet loss every 30.999 seconds
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Dec 2007 04:36:47 -0000

On Sat, 22 Dec 2007, Mark Fullmer wrote:

> On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote:
>> 
>> I still don't understand the original problem, that the kernel is not
>> even preemptible enough for network interrupts to work (except in 5.2
>> where Giant breaks things).  Perhaps I misread the problem, and it is
>> actually that networking works but userland is unable to run in time
>> to avoid packet loss.
>
> The test is done with UDP packets between two servers.  The em
> driver is incrementing the received packet count correctly but
> the packet is not making it up the network stack.  If
> the application was not servicing the socket fast enough I would
> expect to see the "dropped due to full socket buffers" (udps_fullsock)
> counter incrementing, as shown by netstat -s.

I couldn't see any sign of PREEMPTION not working in 6.3-PREREALEASE.
em seemed to keep up with the maximum rate that I can easily generate
(640 kpps with tiny udp packets), though it cannot transmit at more than
400 kpps on the same hardware.  This is without aby syncer activity to
cause glitches.  The rest of the system couldn't keep up, and with my
normal configuration of net.isr.direct=1, systat -ip (udps_fullsock)
showed too many packets being dropped, but all the numbers seemed to
add up right.  (I didn't do end-to-end packet counts.  I'm using ttcp
to send and receive packets; the receiver loses so many packets that
it rarely terminates properly, and when it does terminate it always
shows many dropped.)  However, with net.isr.direct=0, packets are dropped
with no sign of the problem except a reduced count of good packets in
systat -ip.

Packet rate counter     net.isr.direct=1      net.isr.direct=0
-------------------     ----------------      ----------------
netstat -I              639042                643522 (faster later)
systat -ip (total rx)   639042                382567 (dropped many b4 here)
           (UDP total)   639042                382567
       (udps_fullsock)   298911                70340
      (diff of prev 2)   340031                312227 (300+k always dropped)
net.isr.count           small                 large (seems to be correct 643k)
net.isr.directed        large (correct?)      no change
net.isr.queued          0                     0
net.isr.drop            0                     0

net.isr.direct=0 is apparently causing dropped packets without even counting
them.  However, the drop seems to be below the netisr level.

More worryingly, with full 1500-byte packets (1472 data + 28 UDP
header), packets can be sent at a rate of 76 kpps (nearly 950 Mbps)
with a load of only 80% on the receiver, yet the ttcp receiver still
drops about 1000 pps due top "socket buffer full".  With net.usr.direct=0
it drops an additinal 700 pps due to this.  Glitches from sync(2)
taking 25 ms increase the loss by about 1000 packets, and using rtprio
for the ttcp receiver doesn't seem to help at all.

In previous mail, you (Mark) wrote:

# With FreeBSD 4 I was able to run a UDP data collector with rtprio set,
# kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF
# in the application.  If packets were dropped they would show up
# with netstat -s as "dropped due to full socket buffers".
# 
# Since the packet never makes it to ip_input() I no longer have
# any way to count drops.  There will always be corner cases where
# interrupts are lost and drops not accounted for if the adapter
# hardware can't report them, but right now I've got no way to
# estimate any loss.

I tried using SO_RCVBUF in ttcp (it's an old version of ttcp that doesn't
have an option for this).  With the default kern.ipc.maxsockbuf of 256K,
this didn't seem to help.  20MB should work better :-) but I didn't try that.
I don't understand how fast the socket buffer fills up and would have
thought that 256K was enough for tiny packets but not for 1500-byte packets.
Their seems to be a general problem that 1Gbps NICs have or should have
rings of size >= 256 or 512 so that they aren't forced to drop packets
when their interrupt handler has a reasonable but larger latency, yet if
we actually use this feature then we flood the upper layers with hundreds
of packets and fill up socket buffers etc. there.

Bruce