Date: Mon, 3 Dec 2012 09:38:00 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Maxim Sobolev <sobomax@FreeBSD.org> Cc: Alfred Perlstein <bright@mu.org>, Andre Oppermann <andre@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-user@freebsd.org" <svn-src-user@freebsd.org> Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys Message-ID: <alpine.BSF.2.00.1212030937160.18806@fledge.watson.org> In-Reply-To: <50BC6EF9.4040706@FreeBSD.org> References: <201211120847.qAC8lEAM086331@svn.freebsd.org> <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org> <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org> <50BC4EF6.8040902@FreeBSD.org> <50BC61A1.9040604@freebsd.org> <50BC6EF9.4040706@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Dec 2012, Maxim Sobolev wrote: >>> We are also in quite mbufs hungry environment, is's not 10GigE, but we are >>> dealing with forwarding voice traffic, which consists of predominantly >>> very small packets (20-40 bytes). So we have a lot of small packets >>> in-flight, which uses a lot of MBUFS. >>> >>> What however happens, the network stack consistently lock up after we put >>> more than 16-18MB/sec onto it, which corresponds to about 350-400 Kpps. >> >> Can you drop into kdb? Do you have any backtrace to see where or how it >> lock up? > > Unfortunately it's hardly and option in production, unless we can reproduce > the issue on the test machine. It is not locking up per se, but all > network-related activity ceases. We can still get in through kvm console. Could you share the results of vmstat -z and netstat -m for the box? (FYI, if you do find yourself in DDB, "show uma" is essentially the same as "vmstat -z".) Robert > >>> This is way lower than any nmbclusters/maxusers limits we have >>> (1.5m/1500). >>> >>> With half of that critical load right now we see something along those >>> lines: >>> >>> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max) >>> 149617K/187910K/337528K bytes allocated to network (current/cache/total) >>> >>> Machine has 24GB of ram. >>> >>> vm.kmem_map_free: 24886267904 >>> vm.kmem_map_size: 70615040 >>> vm.kmem_size_scale: 1 >>> vm.kmem_size_max: 329853485875 >>> vm.kmem_size_min: 0 >>> vm.kmem_size: 24956903424 >>> >>> So my question is whether there are some other limits that can cause >>> MBUFS starvation if the number >>> of allocated clusters grows to more than 200-250k? I am curious how it >>> works in the dynamic system - >>> since no memory is pre-allocated for MBUFS, what happens if the >>> network load increases gradually >>> while the system is running? Is it possible to get to ENOMEM >>> eventually with all memory already >>> taken for other pools? >> >> Yes, mbuf allocation is not guaranteed and can fail before the limit is >> reached. What may happen is that a RX DMA ring refill failed and the >> driver wedges. This would be a driver bug. >> >> Can you give more information on the NIC's and drivers you use? > > All of them use various incarnations of Intel GigE chip, mostly igb(4), but > we've seen the same behaviour with em(4) as well. > > Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has > the same issue. > > igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port > 0xec00-0xec1f mem > 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at > device 0.1 on pci10 > igb1: Using MSIX interrupts with 9 vectors > igb1: Ethernet address: 00:30:48:cf:bb:1d > igb1: [ITHREAD] > igb1: Bound queue 0 to cpu 8 > igb1: [ITHREAD] > igb1: Bound queue 1 to cpu 9 > igb1: [ITHREAD] > igb1: Bound queue 2 to cpu 10 > igb1: [ITHREAD] > igb1: Bound queue 3 to cpu 11 > igb1: [ITHREAD] > igb1: Bound queue 4 to cpu 12 > igb1: [ITHREAD] > igb1: Bound queue 5 to cpu 13 > igb1: [ITHREAD] > igb1: Bound queue 6 to cpu 14 > igb1: [ITHREAD] > igb1: Bound queue 7 to cpu 15 > igb1: [ITHREAD] > > igb1@pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > class = network > subclass = ethernet > > -Maxim >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1212030937160.18806>