Date: Wed, 30 Mar 2011 17:55:47 +0200 From: Vlad Galu <dudu@dudu.ro> To: pyunyh@gmail.com Cc: freebsd-net@freebsd.org, Arnaud Lacombe <lacombar@gmail.com> Subject: Re: bge(4) on RELENG_8 mbuf cluster starvation Message-ID: <AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s@mail.gmail.com> In-Reply-To: <20110313011632.GA1621@michelle.cdnetworks.com> References: <AANLkTimSs48ftRv8oh1wTwMEpgN1Ny3B1ahzfS=AbML_@mail.gmail.com> <AANLkTimfh3OdXOe1JFo5u6JypcLrcWKv2WpSu8Uv-tgv@mail.gmail.com> <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com> <20110313011632.GA1621@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 13, 2011 at 2:16 AM, YongHyeon PYUN <pyunyh@gmail.com> wrote: > On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote: > > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe <lacombar@gmail.com> > wrote: > > > > > Hi, > > > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu <dudu@dudu.ro> wrote: > > > > Hi folks, > > > > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting > > > > -- cut here -- > > > > 1096/1454/2550 mbufs in use (current/cache/total) > > > > 1035/731/1766/262144 mbuf clusters in use (current/cache/total/max) > > > > 1035/202 mbuf+clusters out of packet secondary zone in use > > > (current/cache) > > > > 0/117/117/12800 4k (page size) jumbo clusters in use > > > > (current/cache/total/max) > > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > > > 2344K/2293K/4637K bytes allocated to network (current/cache/total) > > > > 0/70128196/37726935 requests for mbufs denied > > > (mbufs/clusters/mbuf+clusters) > > > > ^^^^^^^^^^^^^^^^^^^^^ > > > > -- and here -- > > > > > > > > kern.ipc.nmbclusters is set to 131072. Other settings: > > > no, netstat(8) says 262144. > > > > > > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that from > the > > top of my head. > > > > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful. > > > > > > - Arnaud > > > > > > > Sure: > > [...] > > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC > rev. > > 0x004101 > > dev.bge.1.%driver: bge > > dev.bge.1.%location: slot=0 function=0 > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > > subdevice=0x02c6 class=0x020000 > > dev.bge.1.%parent: pci5 > > dev.bge.1.forced_collapse: 2 > > dev.bge.1.forced_udpcsum: 0 > > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > > dev.bge.1.stats.DmaWriteQueueFull: 0 > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > > dev.bge.1.stats.NoMoreRxBDs: 680050 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > This indicates bge(4) encountered RX buffer shortage. Perhaps > bge(4) couldn't fill new RX buffers for incoming frames due to > other system activities. > > > dev.bge.1.stats.InputDiscards: 228755931 > > This counter indicates number of frames discarded due to RX buffer > shortage. bge(4) discards received frame if it failed to allocate > new RX buffer such that InputDiscards is normally higher than > NoMoreRxBDs. > > > dev.bge.1.stats.InputErrors: 49080818 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Something is wrong here. Too many frames were classified as error > frames. You may see poor RX performance. > > > dev.bge.1.stats.RecvThresholdHit: 0 > > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247 > > dev.bge.1.stats.rx.Fragments: 47887706 > > dev.bge.1.stats.rx.UnicastPkts: 32672557601 > > dev.bge.1.stats.rx.MulticastPkts: 1218 > > dev.bge.1.stats.rx.BroadcastPkts: 2 > > dev.bge.1.stats.rx.FCSErrors: 2822217 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > FCS errors are too high. Please check cabling again(I'm assuming > the controller is not broken here). I think you can use vendor's > diagnostic tools to verify this. > > > dev.bge.1.stats.rx.AlignmentErrors: 0 > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > > dev.bge.1.stats.rx.ControlFramesReceived: 0 > > dev.bge.1.stats.rx.xoffStateEntered: 0 > > dev.bge.1.stats.rx.FramesTooLong: 0 > > dev.bge.1.stats.rx.Jabbers: 0 > > dev.bge.1.stats.rx.UndersizePkts: 0 > > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826 > > dev.bge.1.stats.tx.Collisions: 0 > > dev.bge.1.stats.tx.XonSent: 0 > > dev.bge.1.stats.tx.XoffSent: 0 > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > > dev.bge.1.stats.tx.DeferredTransmissions: 0 > > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > > dev.bge.1.stats.tx.LateCollisions: 0 > > dev.bge.1.stats.tx.UnicastPkts: 281039183 > > dev.bge.1.stats.tx.MulticastPkts: 0 > > dev.bge.1.stats.tx.BroadcastPkts: 1153 > > -- and here -- > > > > And now, that I remembered about this as well: > > -- cut here -- > > Name Mtu Network Address Ipkts Ierrs Idrop Opkts > > Oerrs Coll > > bge1 1500 <Link#2> 00:11:25:22:0d:ed 32321767025 278517070 > 37726837 > > 281068216 0 0 > > -- and here -- > > The colo provider changed my cable a couple of times so I'd not blame it > on > > that. Unfortunately, I don't have access to the port statistics on the > > switch. Running netstat with -w1 yields between 0 and 4 errors/second. > > > > Hardware MAC counters still show high number of FCS errors. The > service provider should have to check possible cabling issues on > the port of the switch. > After swapping cables and moving the NIC into another switch, there are some improvements. However: -- cut here -- dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004101 dev.bge.1.%driver: bge dev.bge.1.%location: slot=0 function=0 dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 subdevice=0x02c6 class=0x020000 dev.bge.1.%parent: pci5 dev.bge.1.forced_collapse: 0 dev.bge.1.forced_udpcsum: 0 dev.bge.1.stats.FramesDroppedDueToFilters: 0 dev.bge.1.stats.DmaWriteQueueFull: 0 dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 dev.bge.1.stats.NoMoreRxBDs: 243248 <- this dev.bge.1.stats.InputDiscards: 9945500 dev.bge.1.stats.InputErrors: 0 dev.bge.1.stats.RecvThresholdHit: 0 dev.bge.1.stats.rx.ifHCInOctets: 36697296701 dev.bge.1.stats.rx.Fragments: 0 dev.bge.1.stats.rx.UnicastPkts: 549334370 dev.bge.1.stats.rx.MulticastPkts: 113638 dev.bge.1.stats.rx.BroadcastPkts: 0 dev.bge.1.stats.rx.FCSErrors: 0 dev.bge.1.stats.rx.AlignmentErrors: 0 dev.bge.1.stats.rx.xonPauseFramesReceived: 0 dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 dev.bge.1.stats.rx.ControlFramesReceived: 0 dev.bge.1.stats.rx.xoffStateEntered: 0 dev.bge.1.stats.rx.FramesTooLong: 0 dev.bge.1.stats.rx.Jabbers: 0 dev.bge.1.stats.rx.UndersizePkts: 0 dev.bge.1.stats.tx.ifHCOutOctets: 10578000636 dev.bge.1.stats.tx.Collisions: 0 dev.bge.1.stats.tx.XonSent: 0 dev.bge.1.stats.tx.XoffSent: 0 dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 dev.bge.1.stats.tx.SingleCollisionFrames: 0 dev.bge.1.stats.tx.MultipleCollisionFrames: 0 dev.bge.1.stats.tx.DeferredTransmissions: 0 dev.bge.1.stats.tx.ExcessiveCollisions: 0 dev.bge.1.stats.tx.LateCollisions: 0 dev.bge.1.stats.tx.UnicastPkts: 64545266 dev.bge.1.stats.tx.MulticastPkts: 0 dev.bge.1.stats.tx.BroadcastPkts: 313 and 0/1710531/2006005 requests for mbufs denied (mbufs/clusters/mbuf+clusters) -- and here -- I'll start gathering some stats/charts on this host to see if I can correlate the starvation with other system events. > However this does not explain why you have large number of mbuf > cluster allocation failure. The only wild guess I have at this > moment is some process or kernel subsystems are too slow to release > allocated mbuf clusters. Did you check various system activities > while seeing the issue? > -- Good, fast & cheap. Pick any two.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s>