Date: Sat, 12 Mar 2011 17:16:32 -0800 From: YongHyeon PYUN <pyunyh@gmail.com> To: Vlad Galu <dudu@dudu.ro> Cc: freebsd-net@freebsd.org, Arnaud Lacombe <lacombar@gmail.com> Subject: Re: bge(4) on RELENG_8 mbuf cluster starvation Message-ID: <20110313011632.GA1621@michelle.cdnetworks.com> In-Reply-To: <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com> References: <AANLkTimSs48ftRv8oh1wTwMEpgN1Ny3B1ahzfS=AbML_@mail.gmail.com> <AANLkTimfh3OdXOe1JFo5u6JypcLrcWKv2WpSu8Uv-tgv@mail.gmail.com> <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote: > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe <lacombar@gmail.com> wrote: > > > Hi, > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu <dudu@dudu.ro> wrote: > > > Hi folks, > > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting > > > -- cut here -- > > > 1096/1454/2550 mbufs in use (current/cache/total) > > > 1035/731/1766/262144 mbuf clusters in use (current/cache/total/max) > > > 1035/202 mbuf+clusters out of packet secondary zone in use > > (current/cache) > > > 0/117/117/12800 4k (page size) jumbo clusters in use > > > (current/cache/total/max) > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > > 2344K/2293K/4637K bytes allocated to network (current/cache/total) > > > 0/70128196/37726935 requests for mbufs denied > > (mbufs/clusters/mbuf+clusters) > > > ^^^^^^^^^^^^^^^^^^^^^ > > > -- and here -- > > > > > > kern.ipc.nmbclusters is set to 131072. Other settings: > > no, netstat(8) says 262144. > > > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that from the > top of my head. > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful. > > > > - Arnaud > > > > Sure: [...] > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. > 0x004101 > dev.bge.1.%driver: bge > dev.bge.1.%location: slot=0 function=0 > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > subdevice=0x02c6 class=0x020000 > dev.bge.1.%parent: pci5 > dev.bge.1.forced_collapse: 2 > dev.bge.1.forced_udpcsum: 0 > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > dev.bge.1.stats.DmaWriteQueueFull: 0 > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > dev.bge.1.stats.NoMoreRxBDs: 680050 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This indicates bge(4) encountered RX buffer shortage. Perhaps bge(4) couldn't fill new RX buffers for incoming frames due to other system activities. > dev.bge.1.stats.InputDiscards: 228755931 This counter indicates number of frames discarded due to RX buffer shortage. bge(4) discards received frame if it failed to allocate new RX buffer such that InputDiscards is normally higher than NoMoreRxBDs. > dev.bge.1.stats.InputErrors: 49080818 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Something is wrong here. Too many frames were classified as error frames. You may see poor RX performance. > dev.bge.1.stats.RecvThresholdHit: 0 > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247 > dev.bge.1.stats.rx.Fragments: 47887706 > dev.bge.1.stats.rx.UnicastPkts: 32672557601 > dev.bge.1.stats.rx.MulticastPkts: 1218 > dev.bge.1.stats.rx.BroadcastPkts: 2 > dev.bge.1.stats.rx.FCSErrors: 2822217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FCS errors are too high. Please check cabling again(I'm assuming the controller is not broken here). I think you can use vendor's diagnostic tools to verify this. > dev.bge.1.stats.rx.AlignmentErrors: 0 > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > dev.bge.1.stats.rx.ControlFramesReceived: 0 > dev.bge.1.stats.rx.xoffStateEntered: 0 > dev.bge.1.stats.rx.FramesTooLong: 0 > dev.bge.1.stats.rx.Jabbers: 0 > dev.bge.1.stats.rx.UndersizePkts: 0 > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826 > dev.bge.1.stats.tx.Collisions: 0 > dev.bge.1.stats.tx.XonSent: 0 > dev.bge.1.stats.tx.XoffSent: 0 > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > dev.bge.1.stats.tx.DeferredTransmissions: 0 > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > dev.bge.1.stats.tx.LateCollisions: 0 > dev.bge.1.stats.tx.UnicastPkts: 281039183 > dev.bge.1.stats.tx.MulticastPkts: 0 > dev.bge.1.stats.tx.BroadcastPkts: 1153 > -- and here -- > > And now, that I remembered about this as well: > -- cut here -- > Name Mtu Network Address Ipkts Ierrs Idrop Opkts > Oerrs Coll > bge1 1500 <Link#2> 00:11:25:22:0d:ed 32321767025 278517070 37726837 > 281068216 0 0 > -- and here -- > The colo provider changed my cable a couple of times so I'd not blame it on > that. Unfortunately, I don't have access to the port statistics on the > switch. Running netstat with -w1 yields between 0 and 4 errors/second. > Hardware MAC counters still show high number of FCS errors. The service provider should have to check possible cabling issues on the port of the switch. However this does not explain why you have large number of mbuf cluster allocation failure. The only wild guess I have at this moment is some process or kernel subsystems are too slow to release allocated mbuf clusters. Did you check various system activities while seeing the issue?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110313011632.GA1621>
