Date: Mon, 13 Sep 2010 10:04:25 -0500 From: Tom Judge <tom@tomjudge.com> To: pyunyh@gmail.com Cc: freebsd-net@freebsd.org, davidch@broadcom.com, yongari@freebsd.org Subject: Re: bce(4) - com_no_buffers (Again) Message-ID: <4C8E3D79.6090102@tomjudge.com> In-Reply-To: <20100910002439.GO7203@michelle.cdnetworks.com> References: <4C894A76.5040200@tomjudge.com> <20100910002439.GO7203@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 09/09/2010 07:24 PM, Pyun YongHyeon wrote: > On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote: > >> Hi, >> I am just following up on the thread from March (I think) about this issue. >> >> We are seeing this issue on a number of systems running 7.1. >> >> The systems in question are all Dell: >> >> * R710 R610 R410 >> * PE2950 >> >> The latter do not show the issue as much as the R series systems. >> >> The cards in one of the R610's that I am testing with are: >> >> bce0@pci0:1:0:0: class=0x020000 card=0x02361028 chip=0x163914e4 >> rev=0x20 hdr=0x00 >> vendor = 'Broadcom Corporation' >> device = 'NetXtreme II BCM5709 Gigabit Ethernet' >> class = network >> subclass = ethernet >> >> They are connected to Dell PowerConnect 5424 switches. >> >> uname -a: >> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4 >> #3: Wed Sep 8 08:19:03 UTC 2010 >> tj@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10 amd64 >> >> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the >> configuration (the nics are in promisc as we are currently capturing >> netflow data on another vlan for diagnostic purposes. ): >> >> >> <SNIP IFCONFIG/> >> I have updated the bce driver and the Broadcomm MII driver to the >> version from stable/7 and am still seeing the issue. >> >> This morning I did a test with increasing the RX_PAGES to 8 but the >> system just hung starting the network. The route command got stuck in a >> zone state (Sorry can't remember exactly which). >> >> The real question is, how do we go about increasing the number of RX >> BDs? I guess we have to bump more that just RX_PAGES... >> >> >> The cause for us, from what we can see, is the openldap server sending >> large group search results back to nss_ldap or pam_ldap. When it does >> this it seems to send each of the 600 results in its own TCP segment >> creating a small packet storm (600*~100byte PDU's) at the destination >> host. The kernel then retransmits 2 blocks of 100 results each after >> SACK kicks in for the data that was dropped by the NIC. >> >> >> Thanks in advance >> >> Tom >> >> >> <SNIP SYSCTL OUTPUT/> > FW may drop incoming frames when it does not see available RX > buffers. Increasing number of RX buffers slightly reduce the > possibility of dropping frames but it wouldn't completely fix it. > Alternatively driver may tell available RX buffers in the middle > of RX ring processing instead of giving updated buffers at the end > of RX processing. This way FW may see available RX buffers while > driver/upper stack is busy to process received frames. But this may > introduce coherency issues because the RX ring is shared between > host and FW. If FreeBSD has way to sync partial region of a DMA > map, this could be implemented without fear of coherency issue. > Another way to improve RX performance would be switching to > multi-RX queue with RSS but that would require a lot of work and I > had no time to implement it. > Does this mean that these cards are going to perform badly? This is was what I gathered from the previous thread. > BTW, given that you've updated to bce(4)/mii(4) of stable/7, I > wonder why TX/RX flow controls were not kicked in. > The working copy I used for grabbing the upstream source is at r212371. Last changes for the directories in my working copy: sys/dev/bce @ 211388 sys/dev/mii @ 212020 I discovered that flow control was disabled on the switches, so I set it to auto and added a pair of BCE_PRINTF's in the code where it enables and disables flow control and now it gets enabled. Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number of errors, however the rate seems to be reduced compaired to the previous version of the driver. Tom -- TJU13-ARIN
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C8E3D79.6090102>