From owner-freebsd-net@FreeBSD.ORG Wed Mar 30 17:23:03 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 798B51065670 for ; Wed, 30 Mar 2011 17:23:03 +0000 (UTC) (envelope-from dudu@dudu.ro) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4457B8FC13 for ; Wed, 30 Mar 2011 17:23:03 +0000 (UTC) Received: by iwn33 with SMTP id 33so1827508iwn.13 for ; Wed, 30 Mar 2011 10:23:02 -0700 (PDT) Received: by 10.43.59.13 with SMTP id wm13mr1394679icb.416.1301505782549; Wed, 30 Mar 2011 10:23:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.3.13 with HTTP; Wed, 30 Mar 2011 10:17:21 -0700 (PDT) In-Reply-To: <20110330171023.GA8601@michelle.cdnetworks.com> References: <20110313011632.GA1621@michelle.cdnetworks.com> <20110330171023.GA8601@michelle.cdnetworks.com> From: Vlad Galu Date: Wed, 30 Mar 2011 19:17:21 +0200 Message-ID: To: pyunyh@gmail.com Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, Arnaud Lacombe Subject: Re: bge(4) on RELENG_8 mbuf cluster starvation X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2011 17:23:03 -0000 On Wed, Mar 30, 2011 at 7:10 PM, YongHyeon PYUN wrote: > On Wed, Mar 30, 2011 at 05:55:47PM +0200, Vlad Galu wrote: > > On Sun, Mar 13, 2011 at 2:16 AM, YongHyeon PYUN > wrote: > > > > > On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote: > > > > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu wrote: > > > > > > Hi folks, > > > > > > > > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting > > > > > > -- cut here -- > > > > > > 1096/1454/2550 mbufs in use (current/cache/total) > > > > > > 1035/731/1766/262144 mbuf clusters in use > (current/cache/total/max) > > > > > > 1035/202 mbuf+clusters out of packet secondary zone in use > > > > > (current/cache) > > > > > > 0/117/117/12800 4k (page size) jumbo clusters in use > > > > > > (current/cache/total/max) > > > > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > > > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > > > > > 2344K/2293K/4637K bytes allocated to network > (current/cache/total) > > > > > > 0/70128196/37726935 requests for mbufs denied > > > > > (mbufs/clusters/mbuf+clusters) > > > > > > ^^^^^^^^^^^^^^^^^^^^^ > > > > > > -- and here -- > > > > > > > > > > > > kern.ipc.nmbclusters is set to 131072. Other settings: > > > > > no, netstat(8) says 262144. > > > > > > > > > > > > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that > from > > > the > > > > top of my head. > > > > > > > > > > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful. > > > > > > > > > > - Arnaud > > > > > > > > > > > > > Sure: > > > > > > [...] > > > > > > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC > > > rev. > > > > 0x004101 > > > > dev.bge.1.%driver: bge > > > > dev.bge.1.%location: slot=0 function=0 > > > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > > > > subdevice=0x02c6 class=0x020000 > > > > dev.bge.1.%parent: pci5 > > > > dev.bge.1.forced_collapse: 2 > > > > dev.bge.1.forced_udpcsum: 0 > > > > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > > > > dev.bge.1.stats.DmaWriteQueueFull: 0 > > > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > > > > dev.bge.1.stats.NoMoreRxBDs: 680050 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > This indicates bge(4) encountered RX buffer shortage. Perhaps > > > bge(4) couldn't fill new RX buffers for incoming frames due to > > > other system activities. > > > > > > > dev.bge.1.stats.InputDiscards: 228755931 > > > > > > This counter indicates number of frames discarded due to RX buffer > > > shortage. bge(4) discards received frame if it failed to allocate > > > new RX buffer such that InputDiscards is normally higher than > > > NoMoreRxBDs. > > > > > > > dev.bge.1.stats.InputErrors: 49080818 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > Something is wrong here. Too many frames were classified as error > > > frames. You may see poor RX performance. > > > > > > > dev.bge.1.stats.RecvThresholdHit: 0 > > > > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247 > > > > dev.bge.1.stats.rx.Fragments: 47887706 > > > > dev.bge.1.stats.rx.UnicastPkts: 32672557601 > > > > dev.bge.1.stats.rx.MulticastPkts: 1218 > > > > dev.bge.1.stats.rx.BroadcastPkts: 2 > > > > dev.bge.1.stats.rx.FCSErrors: 2822217 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > FCS errors are too high. Please check cabling again(I'm assuming > > > the controller is not broken here). I think you can use vendor's > > > diagnostic tools to verify this. > > > > > > > dev.bge.1.stats.rx.AlignmentErrors: 0 > > > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > > > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > > > > dev.bge.1.stats.rx.ControlFramesReceived: 0 > > > > dev.bge.1.stats.rx.xoffStateEntered: 0 > > > > dev.bge.1.stats.rx.FramesTooLong: 0 > > > > dev.bge.1.stats.rx.Jabbers: 0 > > > > dev.bge.1.stats.rx.UndersizePkts: 0 > > > > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826 > > > > dev.bge.1.stats.tx.Collisions: 0 > > > > dev.bge.1.stats.tx.XonSent: 0 > > > > dev.bge.1.stats.tx.XoffSent: 0 > > > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > > > > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > > > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > > > > dev.bge.1.stats.tx.DeferredTransmissions: 0 > > > > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > > > > dev.bge.1.stats.tx.LateCollisions: 0 > > > > dev.bge.1.stats.tx.UnicastPkts: 281039183 > > > > dev.bge.1.stats.tx.MulticastPkts: 0 > > > > dev.bge.1.stats.tx.BroadcastPkts: 1153 > > > > -- and here -- > > > > > > > > And now, that I remembered about this as well: > > > > -- cut here -- > > > > Name Mtu Network Address Ipkts Ierrs Idrop > Opkts > > > > Oerrs Coll > > > > bge1 1500 00:11:25:22:0d:ed 32321767025 278517070 > > > 37726837 > > > > 281068216 0 0 > > > > -- and here -- > > > > The colo provider changed my cable a couple of times so I'd not blame > it > > > on > > > > that. Unfortunately, I don't have access to the port statistics on > the > > > > switch. Running netstat with -w1 yields between 0 and 4 > errors/second. > > > > > > > > > > Hardware MAC counters still show high number of FCS errors. The > > > service provider should have to check possible cabling issues on > > > the port of the switch. > > > > > > > After swapping cables and moving the NIC into another switch, there are > some > > improvements. However: > > -- cut here -- > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC > rev. > > 0x004101 > > dev.bge.1.%driver: bge > > dev.bge.1.%location: slot=0 function=0 > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > > subdevice=0x02c6 class=0x020000 > > dev.bge.1.%parent: pci5 > > dev.bge.1.forced_collapse: 0 > > dev.bge.1.forced_udpcsum: 0 > > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > > dev.bge.1.stats.DmaWriteQueueFull: 0 > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > > dev.bge.1.stats.NoMoreRxBDs: 243248 <- this > > dev.bge.1.stats.InputDiscards: 9945500 > > dev.bge.1.stats.InputErrors: 0 > > There are still discarded frames but I believe it's not related > with any cabling issues since you don't have FCS or alignment > errors. > > > dev.bge.1.stats.RecvThresholdHit: 0 > > dev.bge.1.stats.rx.ifHCInOctets: 36697296701 > > dev.bge.1.stats.rx.Fragments: 0 > > dev.bge.1.stats.rx.UnicastPkts: 549334370 > > dev.bge.1.stats.rx.MulticastPkts: 113638 > > dev.bge.1.stats.rx.BroadcastPkts: 0 > > dev.bge.1.stats.rx.FCSErrors: 0 > > dev.bge.1.stats.rx.AlignmentErrors: 0 > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > > dev.bge.1.stats.rx.ControlFramesReceived: 0 > > dev.bge.1.stats.rx.xoffStateEntered: 0 > > dev.bge.1.stats.rx.FramesTooLong: 0 > > dev.bge.1.stats.rx.Jabbers: 0 > > dev.bge.1.stats.rx.UndersizePkts: 0 > > dev.bge.1.stats.tx.ifHCOutOctets: 10578000636 > > dev.bge.1.stats.tx.Collisions: 0 > > dev.bge.1.stats.tx.XonSent: 0 > > dev.bge.1.stats.tx.XoffSent: 0 > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > > dev.bge.1.stats.tx.DeferredTransmissions: 0 > > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > > dev.bge.1.stats.tx.LateCollisions: 0 > > dev.bge.1.stats.tx.UnicastPkts: 64545266 > > dev.bge.1.stats.tx.MulticastPkts: 0 > > dev.bge.1.stats.tx.BroadcastPkts: 313 > > > > and > > 0/1710531/2006005 requests for mbufs denied > (mbufs/clusters/mbuf+clusters) > > -- and here -- > > > > I'll start gathering some stats/charts on this host to see if I can > > correlate the starvation with other system events. > > > > Now MAC statistics counter show no abnormal things which in turn > indicates the mbuf starvation came from other issues. The next > thing is to identify which process or kernel subsystem consumes a > lot of mbuf clusters. > > Thanks for the feedback. Oh, there is a BPF consumer listening on bge1. After noticing http://www.mail-archive.com/freebsd-net@freebsd.org/msg25685.html, I decided to shut it down for a while. It's pretty weird, my BPF buffer size is set to 4MB and traffic on that interface is nowhere near that high. I'll get back as soon as I have new data. > > > > > > > However this does not explain why you have large number of mbuf > > > cluster allocation failure. The only wild guess I have at this > > > moment is some process or kernel subsystems are too slow to release > > > allocated mbuf clusters. Did you check various system activities > > > while seeing the issue? > > > > -- Good, fast & cheap. Pick any two.