From owner-svn-src-user@FreeBSD.ORG  Mon Dec  3 09:38:02 2012
Return-Path: <owner-svn-src-user@FreeBSD.ORG>
Delivered-To: svn-src-user@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id CEB4B56E;
 Mon,  3 Dec 2012 09:38:02 +0000 (UTC)
 (envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 80D6C8FC13;
 Mon,  3 Dec 2012 09:38:02 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
 by cyrus.watson.org (Postfix) with ESMTPS id 15E2C46B39;
 Mon,  3 Dec 2012 04:38:01 -0500 (EST)
Date: Mon, 3 Dec 2012 09:38:00 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Maxim Sobolev <sobomax@FreeBSD.org>
Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern
 sys
In-Reply-To: <50BC6EF9.4040706@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1212030937160.18806@fledge.watson.org>
References: <201211120847.qAC8lEAM086331@svn.freebsd.org>
 <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org>
 <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org>
 <50BC4EF6.8040902@FreeBSD.org>
 <50BC61A1.9040604@freebsd.org> <50BC6EF9.4040706@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Alfred Perlstein <bright@mu.org>, Andre Oppermann <andre@freebsd.org>,
 "src-committers@freebsd.org" <src-committers@freebsd.org>,
 "svn-src-user@freebsd.org" <svn-src-user@freebsd.org>
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2012 09:38:03 -0000


On Mon, 3 Dec 2012, Maxim Sobolev wrote:

>>> We are also in quite mbufs hungry environment, is's not 10GigE, but we are 
>>> dealing with forwarding voice traffic, which consists of predominantly 
>>> very small packets (20-40 bytes). So we have a lot of small packets 
>>> in-flight, which uses a lot of MBUFS.
>>> 
>>> What however happens, the network stack consistently lock up after we put 
>>> more than 16-18MB/sec onto it, which corresponds to about 350-400 Kpps.
>> 
>> Can you drop into kdb?  Do you have any backtrace to see where or how it 
>> lock up?
>
> Unfortunately it's hardly and option in production, unless we can reproduce 
> the issue on the test machine. It is not locking up per se, but all 
> network-related activity ceases. We can still get in through kvm console.

Could you share the results of vmstat -z and netstat -m for the box?

(FYI, if you do find yourself in DDB, "show uma" is essentially the same as 
"vmstat -z".)

Robert


>
>>> This is way lower than any nmbclusters/maxusers limits we have 
>>> (1.5m/1500).
>>> 
>>> With half of that critical load right now we see something along those
>>> lines:
>>> 
>>> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max)
>>> 149617K/187910K/337528K bytes allocated to network (current/cache/total)
>>> 
>>> Machine has 24GB of ram.
>>> 
>>> vm.kmem_map_free: 24886267904
>>> vm.kmem_map_size: 70615040
>>> vm.kmem_size_scale: 1
>>> vm.kmem_size_max: 329853485875
>>> vm.kmem_size_min: 0
>>> vm.kmem_size: 24956903424
>>> 
>>> So my question is whether there are some other limits that can cause
>>> MBUFS starvation if the number
>>> of allocated clusters grows to more than 200-250k? I am curious how it
>>> works in the dynamic system -
>>> since no memory is pre-allocated for MBUFS, what happens if the
>>> network load increases gradually
>>> while the system is running? Is it possible to get to ENOMEM
>>> eventually with all memory already
>>> taken for other pools?
>> 
>> Yes, mbuf allocation is not guaranteed and can fail before the limit is
>> reached.  What may happen is that a RX DMA ring refill failed and the
>> driver wedges.  This would be a driver bug.
>> 
>> Can you give more information on the NIC's and drivers you use?
>
> All of them use various incarnations of Intel GigE chip, mostly igb(4), but 
> we've seen the same behaviour with em(4) as well.
>
> Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has 
> the same issue.
>
> igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port 
> 0xec00-0xec1f mem 
> 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at 
> device 0.1 on pci10
> igb1: Using MSIX interrupts with 9 vectors
> igb1: Ethernet address: 00:30:48:cf:bb:1d
> igb1: [ITHREAD]
> igb1: Bound queue 0 to cpu 8
> igb1: [ITHREAD]
> igb1: Bound queue 1 to cpu 9
> igb1: [ITHREAD]
> igb1: Bound queue 2 to cpu 10
> igb1: [ITHREAD]
> igb1: Bound queue 3 to cpu 11
> igb1: [ITHREAD]
> igb1: Bound queue 4 to cpu 12
> igb1: [ITHREAD]
> igb1: Bound queue 5 to cpu 13
> igb1: [ITHREAD]
> igb1: Bound queue 6 to cpu 14
> igb1: [ITHREAD]
> igb1: Bound queue 7 to cpu 15
> igb1: [ITHREAD]
>
> igb1@pci0:10:0:1:       class=0x020000 card=0x10c915d9 chip=0x10c98086 
> rev=0x01 hdr=0x00
>    vendor     = 'Intel Corporation'
>    class      = network
>    subclass   = ethernet
>
> -Maxim
>