From owner-freebsd-net@FreeBSD.ORG Sun Mar 13 01:17:20 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF374106564A for ; Sun, 13 Mar 2011 01:17:20 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 815CC8FC0C for ; Sun, 13 Mar 2011 01:17:20 +0000 (UTC) Received: by yxl31 with SMTP id 31so1861752yxl.13 for ; Sat, 12 Mar 2011 17:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:date:to:cc:subject:message-id:reply-to :references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=LdW3h2t0XMqw/X0S/GTSIleJzkCCQN212+2aD5hLrLw=; b=YXi6++bL7QQJp6DcYYbg9DDvPUcT3UZOmBW2/G4EUlwQc++mbpWa4Dtg0TqmyPMDTP o+mNaax9cUghfz2lbAxmqa5LlZxeUw9kh2y/PeE2la8uAzoVZl5B8vX5cS2X/3lnh9ST RlKWb1Ask5aSGjeBcGmeMyA0B7zLffmuDLgsA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=Ny0V1gM4FlrhWXp6v2AgF0XNUBQ6f6Z14SAs4TykeBd16zJeIXAhEzLCFeT8Z4Wg9U 9y7k6fb9LnF43obhtnm/v9VAYKEaIOZe7rSsFETokSacylSfqrgwErmhJv3i0bzUWv3G JZtMaZTjYZmq9F3YOhUxNsbKPBIwCL0OmYSl0= Received: by 10.150.66.11 with SMTP id o11mr4458071yba.159.1299979039778; Sat, 12 Mar 2011 17:17:19 -0800 (PST) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id q20sm817375ybk.5.2011.03.12.17.17.16 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 12 Mar 2011 17:17:18 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Sat, 12 Mar 2011 17:16:32 -0800 From: YongHyeon PYUN Date: Sat, 12 Mar 2011 17:16:32 -0800 To: Vlad Galu Message-ID: <20110313011632.GA1621@michelle.cdnetworks.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, Arnaud Lacombe Subject: Re: bge(4) on RELENG_8 mbuf cluster starvation X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 01:17:20 -0000 On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote: > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe wrote: > > > Hi, > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu wrote: > > > Hi folks, > > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep getting > > > -- cut here -- > > > 1096/1454/2550 mbufs in use (current/cache/total) > > > 1035/731/1766/262144 mbuf clusters in use (current/cache/total/max) > > > 1035/202 mbuf+clusters out of packet secondary zone in use > > (current/cache) > > > 0/117/117/12800 4k (page size) jumbo clusters in use > > > (current/cache/total/max) > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > > 2344K/2293K/4637K bytes allocated to network (current/cache/total) > > > 0/70128196/37726935 requests for mbufs denied > > (mbufs/clusters/mbuf+clusters) > > > ^^^^^^^^^^^^^^^^^^^^^ > > > -- and here -- > > > > > > kern.ipc.nmbclusters is set to 131072. Other settings: > > no, netstat(8) says 262144. > > > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that from the > top of my head. > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful. > > > > - Arnaud > > > > Sure: [...] > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. > 0x004101 > dev.bge.1.%driver: bge > dev.bge.1.%location: slot=0 function=0 > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014 > subdevice=0x02c6 class=0x020000 > dev.bge.1.%parent: pci5 > dev.bge.1.forced_collapse: 2 > dev.bge.1.forced_udpcsum: 0 > dev.bge.1.stats.FramesDroppedDueToFilters: 0 > dev.bge.1.stats.DmaWriteQueueFull: 0 > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 > dev.bge.1.stats.NoMoreRxBDs: 680050 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This indicates bge(4) encountered RX buffer shortage. Perhaps bge(4) couldn't fill new RX buffers for incoming frames due to other system activities. > dev.bge.1.stats.InputDiscards: 228755931 This counter indicates number of frames discarded due to RX buffer shortage. bge(4) discards received frame if it failed to allocate new RX buffer such that InputDiscards is normally higher than NoMoreRxBDs. > dev.bge.1.stats.InputErrors: 49080818 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Something is wrong here. Too many frames were classified as error frames. You may see poor RX performance. > dev.bge.1.stats.RecvThresholdHit: 0 > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247 > dev.bge.1.stats.rx.Fragments: 47887706 > dev.bge.1.stats.rx.UnicastPkts: 32672557601 > dev.bge.1.stats.rx.MulticastPkts: 1218 > dev.bge.1.stats.rx.BroadcastPkts: 2 > dev.bge.1.stats.rx.FCSErrors: 2822217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FCS errors are too high. Please check cabling again(I'm assuming the controller is not broken here). I think you can use vendor's diagnostic tools to verify this. > dev.bge.1.stats.rx.AlignmentErrors: 0 > dev.bge.1.stats.rx.xonPauseFramesReceived: 0 > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 > dev.bge.1.stats.rx.ControlFramesReceived: 0 > dev.bge.1.stats.rx.xoffStateEntered: 0 > dev.bge.1.stats.rx.FramesTooLong: 0 > dev.bge.1.stats.rx.Jabbers: 0 > dev.bge.1.stats.rx.UndersizePkts: 0 > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826 > dev.bge.1.stats.tx.Collisions: 0 > dev.bge.1.stats.tx.XonSent: 0 > dev.bge.1.stats.tx.XoffSent: 0 > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 > dev.bge.1.stats.tx.SingleCollisionFrames: 0 > dev.bge.1.stats.tx.MultipleCollisionFrames: 0 > dev.bge.1.stats.tx.DeferredTransmissions: 0 > dev.bge.1.stats.tx.ExcessiveCollisions: 0 > dev.bge.1.stats.tx.LateCollisions: 0 > dev.bge.1.stats.tx.UnicastPkts: 281039183 > dev.bge.1.stats.tx.MulticastPkts: 0 > dev.bge.1.stats.tx.BroadcastPkts: 1153 > -- and here -- > > And now, that I remembered about this as well: > -- cut here -- > Name Mtu Network Address Ipkts Ierrs Idrop Opkts > Oerrs Coll > bge1 1500 00:11:25:22:0d:ed 32321767025 278517070 37726837 > 281068216 0 0 > -- and here -- > The colo provider changed my cable a couple of times so I'd not blame it on > that. Unfortunately, I don't have access to the port statistics on the > switch. Running netstat with -w1 yields between 0 and 4 errors/second. > Hardware MAC counters still show high number of FCS errors. The service provider should have to check possible cabling issues on the port of the switch. However this does not explain why you have large number of mbuf cluster allocation failure. The only wild guess I have at this moment is some process or kernel subsystems are too slow to release allocated mbuf clusters. Did you check various system activities while seeing the issue?