From owner-freebsd-net@FreeBSD.ORG  Wed May 19 17:51:42 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 109CC1065670
	for <freebsd-net@freebsd.org>; Wed, 19 May 2010 17:51:42 +0000 (UTC)
	(envelope-from rihad@mail.ru)
Received: from mx75.mail.ru (mx75.mail.ru [94.100.176.90])
	by mx1.freebsd.org (Postfix) with ESMTP id C01C08FC19
	for <freebsd-net@freebsd.org>; Wed, 19 May 2010 17:51:41 +0000 (UTC)
Received: from [217.25.27.27] (port=15216 helo=[217.25.27.27])
	by mx75.mail.ru with asmtp id 1OEnQh-000P5A-00
	for freebsd-net@freebsd.org; Wed, 19 May 2010 21:51:39 +0400
Message-ID: <4BF4252F.8000208@mail.ru>
Date: Wed, 19 May 2010 22:51:43 +0500
From: rihad <rihad@mail.ru>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam: Not detected
X-Mras: Ok
Subject: increasing em(4) buffer sizes
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: rihad <rihad@mail.ru>
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 May 2010 17:51:42 -0000

Hi there,

We have a FreeBSD 7.2 Intel Server System 4GB RAM box doing traffic 
shaping and accounting. It has two em gigabit interfaces: one used for 
input, the other for output, servicing around 500-600 mbps load through 
it. Traffic limiting is accomplished by dynamically setting up IPFW 
pipes, which in turn work fine for our per-user traffic accounting needs 
thanks to byte counters. So the firewall is basically a longish string 
of pipe rules. This worked fine when the number of online users was low, 
but now, as we've slowly begun servicing 2-3K online users netstat -i's 
Ierrs column is growing at a rate of 5-15K per hour for em0, the 
interface used for input. Apparently searching through the firewall 
linearly for _each_ arriving packet locks the interface for the duration 
of the search (even though net.isr.direct=0), so some packets are 
periodically dropped on input. To mitigate the problem I've set up a 
two-level hash by means of skipto rules, dropping the number of up to 
several thousand rules to be searched for each packet to a mere 85 max, 
but the rate of Ierrs has only increased to 40-50K per hour, I don't 
know why. I've also tried setting these sysctls:

hw.intr_storm_threshold=10000
dev.em.0.rx_processing_limit=3000

but they didn't help at all. BTW, the other current settings are:
kern.hz=4000
net.inet.ip.fw.verbose=0
kern.ipc.nmbclusters=111111
net.inet.ip.fastforwarding=1
net.inet.ip.dummynet.io_fast=1
net.isr.direct=0
net.inet.ip.intr_queue_maxlen=5000

net.inet.ip.intr_queue_drops is always zero.

I think the problem lies in the buffer size of em not being large enough 
to buffer the packets as they're arriving. I looked in 
/sys/dev/e1000/if_em.c and found this:

in em_attach():
         adapter->rx_buffer_len = 2048;

and later in em_initialize_receive_unit():
         switch (adapter->rx_buffer_len) {
         default:
         case 2048:
                 rctl |= E1000_RCTL_SZ_2048;
                 break;
         case 4096:
                 rctl |= E1000_RCTL_SZ_4096 |
                     E1000_RCTL_BSEX | E1000_RCTL_LPE;
                 break;
         case 8192:
                 rctl |= E1000_RCTL_SZ_8192 |
                     E1000_RCTL_BSEX | E1000_RCTL_LPE;
                 break;
         case 16384:
                 rctl |= E1000_RCTL_SZ_16384 |
                     E1000_RCTL_BSEX | E1000_RCTL_LPE;
                 break;
         }


So apparently the default buffer size is 2048 bytes, and as much as 
16384 is supported. But at what price? Those constants do look 
suspicious. Can I blindly change rx_buffer_len in em_attach()? Sorry, 
I'm not a kernel hacker :(

Thanks for reading and thanks for any tips. Any help is much appreciated.