Date: Wed, 30 Mar 2011 19:17:21 +0200 From: Vlad Galu <dudu@dudu.ro> To: pyunyh@gmail.com Cc: freebsd-net@freebsd.org, Arnaud Lacombe <lacombar@gmail.com> Subject: Re: bge(4) on RELENG_8 mbuf cluster starvation Message-ID: <AANLkTi=mO65OoDTcz2gxpsB075-%2BWdjKTFe9Chm_MY=Y@mail.gmail.com> In-Reply-To: <20110330171023.GA8601@michelle.cdnetworks.com> References: <AANLkTimSs48ftRv8oh1wTwMEpgN1Ny3B1ahzfS=AbML_@mail.gmail.com> <AANLkTimfh3OdXOe1JFo5u6JypcLrcWKv2WpSu8Uv-tgv@mail.gmail.com> <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com> <20110313011632.GA1621@michelle.cdnetworks.com> <AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s@mail.gmail.com> <20110330171023.GA8601@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 30, 2011 at 7:10 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote: > On Wed, Mar 30, 2011 at 05:55:47PM +0200, Vlad Galu wrote: > > On Sun, Mar 13, 2011 at 2:16 AM, YongHyeon PYUN <pyunyh@gmail.com> > wrote: > > > > > On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote: > > > > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe <lacombar@gmail.com> > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu <dudu@dudu.ro> wrote: > > > > > > Hi folks, > > > > > > > > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting > > > > > > -- cut here -- > > > > > > 1096/1454/2550 mbufs in use (current/cache/total) > > > > > > 1035/731/1766/262144 mbuf clusters in use > (current/cache/total/max) > > > > > > 1035/202 mbuf+clusters out of packet secondary zone in use > > > > > (current/cache) > > > > > > 0/117/117/12800 4k (page size) jumbo clusters in use > > > > > > (current/cache/total/max) > > > > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > > > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > > > > > 2344K/2293K/4637K bytes allocated to network > (current/cache/total) > > > > > > 0/70128196/37726935 requests for mbufs denied > > > > > (mbufs/clusters/mbuf+clusters) > > > > > > ^^^^^^^^^^^^^^^^^^^^^ > > > > > > -- and here -- > > > > > > > > > > > > kern.ipc.nmbclusters is set to 131072. Other settings: > > > > > no, netstat(8) says 262144. > > > > > > > > > > > > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that > from > > > the > > > > top of my head. > > > > > > > > > > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful. > > > > > > > > > > - Arnaud > > > > > > > > > > > > > Sure: > > > > > > [...] > > > > > > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC > > > rev. > > > > 0x004101 > > > > dev.bge.1.%driver: bge > > > > dev.bge.1.%location: slot=0 function=0 > > > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > > > > subdevice=0x02c6 class=0x020000 > > > > dev.bge.1.%parent: pci5 > > > > dev.bge.1.forced_collapse: 2 > > > > dev.bge.1.forced_udpcsum: 0 > > > > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > > > > dev.bge.1.stats.DmaWriteQueueFull: 0 > > > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > > > > dev.bge.1.stats.NoMoreRxBDs: 680050 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > This indicates bge(4) encountered RX buffer shortage. Perhaps > > > bge(4) couldn't fill new RX buffers for incoming frames due to > > > other system activities. > > > > > > > dev.bge.1.stats.InputDiscards: 228755931 > > > > > > This counter indicates number of frames discarded due to RX buffer > > > shortage. bge(4) discards received frame if it failed to allocate > > > new RX buffer such that InputDiscards is normally higher than > > > NoMoreRxBDs. > > > > > > > dev.bge.1.stats.InputErrors: 49080818 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > Something is wrong here. Too many frames were classified as error > > > frames. You may see poor RX performance. > > > > > > > dev.bge.1.stats.RecvThresholdHit: 0 > > > > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247 > > > > dev.bge.1.stats.rx.Fragments: 47887706 > > > > dev.bge.1.stats.rx.UnicastPkts: 32672557601 > > > > dev.bge.1.stats.rx.MulticastPkts: 1218 > > > > dev.bge.1.stats.rx.BroadcastPkts: 2 > > > > dev.bge.1.stats.rx.FCSErrors: 2822217 > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > FCS errors are too high. Please check cabling again(I'm assuming > > > the controller is not broken here). I think you can use vendor's > > > diagnostic tools to verify this. > > > > > > > dev.bge.1.stats.rx.AlignmentErrors: 0 > > > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > > > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > > > > dev.bge.1.stats.rx.ControlFramesReceived: 0 > > > > dev.bge.1.stats.rx.xoffStateEntered: 0 > > > > dev.bge.1.stats.rx.FramesTooLong: 0 > > > > dev.bge.1.stats.rx.Jabbers: 0 > > > > dev.bge.1.stats.rx.UndersizePkts: 0 > > > > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826 > > > > dev.bge.1.stats.tx.Collisions: 0 > > > > dev.bge.1.stats.tx.XonSent: 0 > > > > dev.bge.1.stats.tx.XoffSent: 0 > > > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > > > > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > > > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > > > > dev.bge.1.stats.tx.DeferredTransmissions: 0 > > > > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > > > > dev.bge.1.stats.tx.LateCollisions: 0 > > > > dev.bge.1.stats.tx.UnicastPkts: 281039183 > > > > dev.bge.1.stats.tx.MulticastPkts: 0 > > > > dev.bge.1.stats.tx.BroadcastPkts: 1153 > > > > -- and here -- > > > > > > > > And now, that I remembered about this as well: > > > > -- cut here -- > > > > Name Mtu Network Address Ipkts Ierrs Idrop > Opkts > > > > Oerrs Coll > > > > bge1 1500 <Link#2> 00:11:25:22:0d:ed 32321767025 278517070 > > > 37726837 > > > > 281068216 0 0 > > > > -- and here -- > > > > The colo provider changed my cable a couple of times so I'd not blame > it > > > on > > > > that. Unfortunately, I don't have access to the port statistics on > the > > > > switch. Running netstat with -w1 yields between 0 and 4 > errors/second. > > > > > > > > > > Hardware MAC counters still show high number of FCS errors. The > > > service provider should have to check possible cabling issues on > > > the port of the switch. > > > > > > > After swapping cables and moving the NIC into another switch, there are > some > > improvements. However: > > -- cut here -- > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC > rev. > > 0x004101 > > dev.bge.1.%driver: bge > > dev.bge.1.%location: slot=0 function=0 > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > > subdevice=0x02c6 class=0x020000 > > dev.bge.1.%parent: pci5 > > dev.bge.1.forced_collapse: 0 > > dev.bge.1.forced_udpcsum: 0 > > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > > dev.bge.1.stats.DmaWriteQueueFull: 0 > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > > dev.bge.1.stats.NoMoreRxBDs: 243248 <- this > > dev.bge.1.stats.InputDiscards: 9945500 > > dev.bge.1.stats.InputErrors: 0 > > There are still discarded frames but I believe it's not related > with any cabling issues since you don't have FCS or alignment > errors. > > > dev.bge.1.stats.RecvThresholdHit: 0 > > dev.bge.1.stats.rx.ifHCInOctets: 36697296701 > > dev.bge.1.stats.rx.Fragments: 0 > > dev.bge.1.stats.rx.UnicastPkts: 549334370 > > dev.bge.1.stats.rx.MulticastPkts: 113638 > > dev.bge.1.stats.rx.BroadcastPkts: 0 > > dev.bge.1.stats.rx.FCSErrors: 0 > > dev.bge.1.stats.rx.AlignmentErrors: 0 > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > > dev.bge.1.stats.rx.ControlFramesReceived: 0 > > dev.bge.1.stats.rx.xoffStateEntered: 0 > > dev.bge.1.stats.rx.FramesTooLong: 0 > > dev.bge.1.stats.rx.Jabbers: 0 > > dev.bge.1.stats.rx.UndersizePkts: 0 > > dev.bge.1.stats.tx.ifHCOutOctets: 10578000636 > > dev.bge.1.stats.tx.Collisions: 0 > > dev.bge.1.stats.tx.XonSent: 0 > > dev.bge.1.stats.tx.XoffSent: 0 > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > > dev.bge.1.stats.tx.DeferredTransmissions: 0 > > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > > dev.bge.1.stats.tx.LateCollisions: 0 > > dev.bge.1.stats.tx.UnicastPkts: 64545266 > > dev.bge.1.stats.tx.MulticastPkts: 0 > > dev.bge.1.stats.tx.BroadcastPkts: 313 > > > > and > > 0/1710531/2006005 requests for mbufs denied > (mbufs/clusters/mbuf+clusters) > > -- and here -- > > > > I'll start gathering some stats/charts on this host to see if I can > > correlate the starvation with other system events. > > > > Now MAC statistics counter show no abnormal things which in turn > indicates the mbuf starvation came from other issues. The next > thing is to identify which process or kernel subsystem consumes a > lot of mbuf clusters. > > Thanks for the feedback. Oh, there is a BPF consumer listening on bge1. After noticing http://www.mail-archive.com/freebsd-net@freebsd.org/msg25685.html, I decided to shut it down for a while. It's pretty weird, my BPF buffer size is set to 4MB and traffic on that interface is nowhere near that high. I'll get back as soon as I have new data. > > > > > > > However this does not explain why you have large number of mbuf > > > cluster allocation failure. The only wild guess I have at this > > > moment is some process or kernel subsystems are too slow to release > > > allocated mbuf clusters. Did you check various system activities > > > while seeing the issue? > > > > -- Good, fast & cheap. Pick any two.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=mO65OoDTcz2gxpsB075-%2BWdjKTFe9Chm_MY=Y>