Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Mar 2011 17:55:47 +0200
From:      Vlad Galu <dudu@dudu.ro>
To:        pyunyh@gmail.com
Cc:        freebsd-net@freebsd.org, Arnaud Lacombe <lacombar@gmail.com>
Subject:   Re: bge(4) on RELENG_8 mbuf cluster starvation
Message-ID:  <AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s@mail.gmail.com>
In-Reply-To: <20110313011632.GA1621@michelle.cdnetworks.com>
References:  <AANLkTimSs48ftRv8oh1wTwMEpgN1Ny3B1ahzfS=AbML_@mail.gmail.com> <AANLkTimfh3OdXOe1JFo5u6JypcLrcWKv2WpSu8Uv-tgv@mail.gmail.com> <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com> <20110313011632.GA1621@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 13, 2011 at 2:16 AM, YongHyeon PYUN <pyunyh@gmail.com> wrote:

> On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote:
> > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe <lacombar@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu <dudu@dudu.ro> wrote:
> > > > Hi folks,
> > > >
> > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting
> > > > -- cut here --
> > > > 1096/1454/2550 mbufs in use (current/cache/total)
> > > > 1035/731/1766/262144 mbuf clusters in use (current/cache/total/max)
> > > > 1035/202 mbuf+clusters out of packet secondary zone in use
> > > (current/cache)
> > > > 0/117/117/12800 4k (page size) jumbo clusters in use
> > > > (current/cache/total/max)
> > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> > > > 2344K/2293K/4637K bytes allocated to network (current/cache/total)
> > > > 0/70128196/37726935 requests for mbufs denied
> > > (mbufs/clusters/mbuf+clusters)
> > > > ^^^^^^^^^^^^^^^^^^^^^
> > > > -- and here --
> > > >
> > > > kern.ipc.nmbclusters is set to 131072. Other settings:
> > > no, netstat(8) says 262144.
> > >
> > >
> > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that from
> the
> > top of my head.
> >
> >
> > > Maybe can you include $(sysctl dev.bge) ? Might be useful.
> > >
> > >  - Arnaud
> > >
> >
> > Sure:
>
> [...]
>
> > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC
> rev.
> > 0x004101
> > dev.bge.1.%driver: bge
> > dev.bge.1.%location: slot=0 function=0
> > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014
> > subdevice=0x02c6 class=0x020000
> > dev.bge.1.%parent: pci5
> > dev.bge.1.forced_collapse: 2
> > dev.bge.1.forced_udpcsum: 0
> > dev.bge.1.stats.FramesDroppedDueToFilters: 0
> > dev.bge.1.stats.DmaWriteQueueFull: 0
> > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
> > dev.bge.1.stats.NoMoreRxBDs: 680050
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This indicates bge(4) encountered RX buffer shortage. Perhaps
> bge(4) couldn't fill new RX buffers for incoming frames due to
> other system activities.
>
> > dev.bge.1.stats.InputDiscards: 228755931
>
> This counter indicates number of frames discarded due to RX buffer
> shortage. bge(4) discards received frame if it failed to allocate
> new RX buffer such that InputDiscards is normally higher than
> NoMoreRxBDs.
>
> > dev.bge.1.stats.InputErrors: 49080818
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Something is wrong here. Too many frames were classified as error
> frames. You may see poor RX performance.
>
> > dev.bge.1.stats.RecvThresholdHit: 0
> > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247
> > dev.bge.1.stats.rx.Fragments: 47887706
> > dev.bge.1.stats.rx.UnicastPkts: 32672557601
> > dev.bge.1.stats.rx.MulticastPkts: 1218
> > dev.bge.1.stats.rx.BroadcastPkts: 2
> > dev.bge.1.stats.rx.FCSErrors: 2822217
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> FCS errors are too high. Please check cabling again(I'm assuming
> the controller is not broken here). I think you can use vendor's
> diagnostic tools to verify this.
>
> > dev.bge.1.stats.rx.AlignmentErrors: 0
> > dev.bge.1.stats.rx.xonPauseFramesReceived: 0
> > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
> > dev.bge.1.stats.rx.ControlFramesReceived: 0
> > dev.bge.1.stats.rx.xoffStateEntered: 0
> > dev.bge.1.stats.rx.FramesTooLong: 0
> > dev.bge.1.stats.rx.Jabbers: 0
> > dev.bge.1.stats.rx.UndersizePkts: 0
> > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826
> > dev.bge.1.stats.tx.Collisions: 0
> > dev.bge.1.stats.tx.XonSent: 0
> > dev.bge.1.stats.tx.XoffSent: 0
> > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
> > dev.bge.1.stats.tx.SingleCollisionFrames: 0
> > dev.bge.1.stats.tx.MultipleCollisionFrames: 0
> > dev.bge.1.stats.tx.DeferredTransmissions: 0
> > dev.bge.1.stats.tx.ExcessiveCollisions: 0
> > dev.bge.1.stats.tx.LateCollisions: 0
> > dev.bge.1.stats.tx.UnicastPkts: 281039183
> > dev.bge.1.stats.tx.MulticastPkts: 0
> > dev.bge.1.stats.tx.BroadcastPkts: 1153
> > -- and here --
> >
> > And now, that I remembered about this as well:
> > -- cut here --
> > Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts
> > Oerrs  Coll
> > bge1   1500 <Link#2>      00:11:25:22:0d:ed 32321767025 278517070
> 37726837
> > 281068216     0     0
> > -- and here --
> > The colo provider changed my cable a couple of times so I'd not blame it
> on
> > that. Unfortunately, I don't have access to the port statistics on the
> > switch. Running netstat with -w1 yields between 0 and 4 errors/second.
> >
>
> Hardware MAC counters still show high number of FCS errors. The
> service provider should have to check possible cabling issues on
> the port of the switch.
>

After swapping cables and moving the NIC into another switch, there are some
improvements. However:
-- cut here --
dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev.
0x004101
dev.bge.1.%driver: bge
dev.bge.1.%location: slot=0 function=0
dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014
subdevice=0x02c6 class=0x020000
dev.bge.1.%parent: pci5
dev.bge.1.forced_collapse: 0
dev.bge.1.forced_udpcsum: 0
dev.bge.1.stats.FramesDroppedDueToFilters: 0
dev.bge.1.stats.DmaWriteQueueFull: 0
dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
dev.bge.1.stats.NoMoreRxBDs: 243248 <- this
dev.bge.1.stats.InputDiscards: 9945500
dev.bge.1.stats.InputErrors: 0
dev.bge.1.stats.RecvThresholdHit: 0
dev.bge.1.stats.rx.ifHCInOctets: 36697296701
dev.bge.1.stats.rx.Fragments: 0
dev.bge.1.stats.rx.UnicastPkts: 549334370
dev.bge.1.stats.rx.MulticastPkts: 113638
dev.bge.1.stats.rx.BroadcastPkts: 0
dev.bge.1.stats.rx.FCSErrors: 0
dev.bge.1.stats.rx.AlignmentErrors: 0
dev.bge.1.stats.rx.xonPauseFramesReceived: 0
dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
dev.bge.1.stats.rx.ControlFramesReceived: 0
dev.bge.1.stats.rx.xoffStateEntered: 0
dev.bge.1.stats.rx.FramesTooLong: 0
dev.bge.1.stats.rx.Jabbers: 0
dev.bge.1.stats.rx.UndersizePkts: 0
dev.bge.1.stats.tx.ifHCOutOctets: 10578000636
dev.bge.1.stats.tx.Collisions: 0
dev.bge.1.stats.tx.XonSent: 0
dev.bge.1.stats.tx.XoffSent: 0
dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
dev.bge.1.stats.tx.SingleCollisionFrames: 0
dev.bge.1.stats.tx.MultipleCollisionFrames: 0
dev.bge.1.stats.tx.DeferredTransmissions: 0
dev.bge.1.stats.tx.ExcessiveCollisions: 0
dev.bge.1.stats.tx.LateCollisions: 0
dev.bge.1.stats.tx.UnicastPkts: 64545266
dev.bge.1.stats.tx.MulticastPkts: 0
dev.bge.1.stats.tx.BroadcastPkts: 313

and
0/1710531/2006005 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
-- and here --

I'll start gathering some stats/charts on this host to see if I can
correlate the starvation with other system events.



> However this does not explain why you have large number of mbuf
> cluster allocation failure. The only wild guess I have at this
> moment is some process or kernel subsystems are too slow to release
> allocated mbuf clusters. Did you check various system activities
> while seeing the issue?
>



-- 
Good, fast & cheap. Pick any two.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s>