From owner-freebsd-net@FreeBSD.ORG  Mon Sep 13 15:04:39 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 947AC1065672;
	Mon, 13 Sep 2010 15:04:39 +0000 (UTC)
	(envelope-from tom@tomjudge.com)
Received: from eu1sys200aog110.obsmtp.com (eu1sys200aog110.obsmtp.com
	[207.126.144.129])
	by mx1.freebsd.org (Postfix) with SMTP id 126548FC25;
	Mon, 13 Sep 2010 15:04:37 +0000 (UTC)
Received: from source ([63.174.175.251]) by eu1sys200aob110.postini.com
	([207.126.147.11]) with SMTP
	ID DSNKTI49g9JBjfe21ffOiFCziqNGiKg4k0id@postini.com;
	Mon, 13 Sep 2010 15:04:38 UTC
Received: from [172.17.10.53] (unknown [172.17.10.53])
	by bbbx3.usdmm.com (Postfix) with ESMTP id 133E3FD01A;
	Mon, 13 Sep 2010 15:04:34 +0000 (UTC)
Message-ID: <4C8E3D79.6090102@tomjudge.com>
Date: Mon, 13 Sep 2010 10:04:25 -0500
From: Tom Judge <tom@tomjudge.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.1.12) Gecko/20100826 Lightning/1.0b1 Thunderbird/3.0.7
MIME-Version: 1.0
To: pyunyh@gmail.com
References: <4C894A76.5040200@tomjudge.com>
	<20100910002439.GO7203@michelle.cdnetworks.com>
In-Reply-To: <20100910002439.GO7203@michelle.cdnetworks.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org, davidch@broadcom.com, yongari@freebsd.org
Subject: Re: bce(4) - com_no_buffers (Again)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Sep 2010 15:04:39 -0000

On 09/09/2010 07:24 PM, Pyun YongHyeon wrote:
> On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote:
>   
>> Hi,
>> I am just following up on the thread from March (I think) about this issue.
>>
>> We are seeing this issue on a number of systems running 7.1. 
>>
>> The systems in question are all Dell:
>>
>> * R710 R610 R410
>> * PE2950
>>
>> The latter do not show the issue as much as the R series systems.
>>
>> The cards in one of the R610's that I am testing with are:
>>
>> bce0@pci0:1:0:0:        class=0x020000 card=0x02361028 chip=0x163914e4
>> rev=0x20 hdr=0x00
>>     vendor     = 'Broadcom Corporation'
>>     device     = 'NetXtreme II BCM5709 Gigabit Ethernet'
>>     class      = network
>>     subclass   = ethernet
>>
>> They are connected to Dell PowerConnect 5424 switches.
>>
>> uname -a:
>> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4
>> #3: Wed Sep  8 08:19:03 UTC 2010    
>> tj@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10  amd64
>>
>> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the
>> configuration (the nics are in promisc as we are currently capturing
>> netflow data on another vlan for diagnostic purposes. ):
>>
>>
>>     
<SNIP IFCONFIG/>
>> I have updated the bce driver and the Broadcomm MII driver to the
>> version from stable/7 and am still seeing the issue.
>>
>> This morning I did a test with increasing the RX_PAGES to 8 but the
>> system just hung starting the network.  The route command got stuck in a
>> zone state (Sorry can't remember exactly which).
>>
>> The real question is, how do we go about increasing the number of RX
>> BDs? I guess we have to bump more that just RX_PAGES...
>>
>>
>> The cause for us, from what we can see, is the openldap server sending
>> large group search results back to nss_ldap or pam_ldap.  When it does
>> this it seems to send each of the 600 results in its own TCP segment
>> creating a small packet storm (600*~100byte PDU's) at the destination
>> host.  The kernel then retransmits 2 blocks of 100 results each after
>> SACK kicks in for the data that was dropped by the NIC.
>>
>>
>> Thanks in advance
>>
>> Tom
>>
>>
>>     
<SNIP SYSCTL OUTPUT/>
> FW may drop incoming frames when it does not see available RX
> buffers. Increasing number of RX buffers slightly reduce the
> possibility of dropping frames but it wouldn't completely fix it.
> Alternatively driver may tell available RX buffers in the middle
> of RX ring processing instead of giving updated buffers at the end
> of RX processing. This way FW may see available RX buffers while
> driver/upper stack is busy to process received frames. But this may
> introduce coherency issues because the RX ring is shared between
> host and FW. If FreeBSD has way to sync partial region of a DMA
> map, this could be implemented without fear of coherency issue.
> Another way to improve RX performance would be switching to
> multi-RX queue with RSS but that would require a lot of work and I
> had no time to implement it.
>   

Does this mean that these cards are going to perform badly? This is was
what I gathered from the previous thread.

> BTW, given that you've updated to bce(4)/mii(4) of stable/7, I
> wonder why TX/RX flow controls were not kicked in.
>   

The working copy I used for grabbing the upstream source is at r212371.

Last changes for the directories in my working copy:

sys/dev/bce @  211388
sys/dev/mii @ 212020


I discovered that flow control was disabled on the switches, so I set it
to auto and added a pair of BCE_PRINTF's in the code where it enables
and disables flow control and now it gets enabled.


Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
of errors, however the rate seems to be reduced compaired to the
previous version of the driver.

Tom




-- 
TJU13-ARIN