From owner-freebsd-stable@FreeBSD.ORG Thu Jan 27 20:38:25 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3B751065670; Thu, 27 Jan 2011 20:38:25 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 24BF08FC0C; Thu, 27 Jan 2011 20:38:24 +0000 (UTC) Received: by wwf26 with SMTP id 26so2444071wwf.31 for ; Thu, 27 Jan 2011 12:38:24 -0800 (PST) Received: by 10.227.144.12 with SMTP id x12mr1662478wbu.102.1296160703918; Thu, 27 Jan 2011 12:38:23 -0800 (PST) Received: from dfleuriot.local ([83.167.62.196]) by mx.google.com with ESMTPS id o6sm441232wbo.15.2011.01.27.12.38.22 (version=SSLv3 cipher=RC4-MD5); Thu, 27 Jan 2011 12:38:23 -0800 (PST) Message-ID: <4D41D7BE.3030208@my.gd> Date: Thu, 27 Jan 2011 21:38:22 +0100 From: Damien Fleuriot User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Jeremy Chadwick References: <4D41417A.20904@my.gd> <1DB50624F8348F48840F2E2CF6040A9D014BEB8833@orsmsx508.amr.corp.intel.com> <4D41B197.6070308@my.gd> <201101280146.57028.wmn@siberianet.ru> <4D41C9FC.10503@my.gd> <20110127195741.GA40449@icarus.home.lan> In-Reply-To: <20110127195741.GA40449@icarus.home.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Sergey Lobanov , "freebsd-stable@freebsd.org" , "freebsd-pf@freebsd.org" Subject: Re: High interrupt rate on a PF box + performance X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jan 2011 20:38:25 -0000 On 1/27/11 8:57 PM, Jeremy Chadwick wrote: > On Thu, Jan 27, 2011 at 08:39:40PM +0100, Damien Fleuriot wrote: >> >> >> On 1/27/11 7:46 PM, Sergey Lobanov wrote: >>> В сообщении от Пятница 28 января 2011 00:55:35 автор Damien Fleuriot написал: >>>> On 1/27/11 6:41 PM, Vogel, Jack wrote: >>>>> Jeremy is right, if you have a problem the first step is to try the >>>>> latest code. >>>>> >>>>> However, when I look at the interrupts below I don't see what the problem >>>>> is? The Broadcom seems to have about the same rate, it just doesn't have >>>>> MSIX (multiple vectors). >>>>> >>>>> Jack >>>> >>>> My main concern is that the CPU %interrupt is quite high, also, we seem >>>> to be experiencing input errors on the interfaces. >>> Would you show igb tuning which is done in loader.conf and output of sysctl >>> dev.igb.0? >>> Did you rise number of igb descriptors such as: >>> hw.igb.rxd=4096 >>> hw.igb.txd=4096 ? >> >> There is no tuning at all on our part in the loader's conf. >> >> Find below the sysctls: >> >> # sysctl -a |grep igb >> dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 1.7.3 >> dev.igb.0.%driver: igb >> dev.igb.0.%location: slot=0 function=0 >> dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10d6 subvendor=0x8086 >> subdevice=0x145a class=0x020000 >> dev.igb.0.%parent: pci14 >> dev.igb.0.debug: -1 >> dev.igb.0.stats: -1 >> dev.igb.0.flow_control: 3 >> dev.igb.0.enable_aim: 1 >> dev.igb.0.low_latency: 128 >> dev.igb.0.ave_latency: 450 >> dev.igb.0.bulk_latency: 1200 >> dev.igb.0.rx_processing_limit: 100 >> dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 1.7.3 >> dev.igb.1.%driver: igb >> dev.igb.1.%location: slot=0 function=1 >> dev.igb.1.%pnpinfo: vendor=0x8086 device=0x10d6 subvendor=0x8086 >> subdevice=0x145a class=0x020000 >> dev.igb.1.%parent: pci14 >> dev.igb.1.debug: -1 >> dev.igb.1.stats: -1 >> dev.igb.1.flow_control: 3 >> dev.igb.1.enable_aim: 1 >> dev.igb.1.low_latency: 128 >> dev.igb.1.ave_latency: 450 >> dev.igb.1.bulk_latency: 1200 >> dev.igb.1.rx_processing_limit: 100 > > I'm not aware of how to tune igb(4), so the advice Sergey gave you may > be applicable. You'll need to schedule downtime to adjust those > tunables however (since a reboot will be requried). > > I also reviewed the munin graphs. I don't see anything necessarily > wrong. However, you omitted yearly graphs for the network interfaces. Indeed I have, the reason is because the yearly graphs are fucked up, for some reason that eludes me munin recorded a 2petabyte spike sometime during september or so. So this makes the whole graph flatlined for the year -.- However, we clearly have an increase in traffic, as we may also see from our nginx requests graphs. > Why I care about that: > > The pf state table (yearly) graph basically correlates with the CPU > usage (yearly) graph, and I expect that the yearly network graphs would > show a similar trend: an increase in your overall traffic over the > course of a year. > > What I'm trying to figure out is what you're concerned about. You are > in fact pushing anywhere between 60-120MBytes/sec across these > interfaces. Given those numbers, I'm not surprised by the ""high"" > interrupt usage. > I'm worried we may hit a bottleneck soon. I was also hoping for some kind of magical way to diminish the interrupts so we could get more performance from the machines. > Graphs of this nature usually indicate that you're hitting a > "bottleneck" (for lack of better word) where you're simply doing "too > much" with a single machine (given its network throughput). The machine > is spending a tremendous amount of CPU time handling network traffic, > and equally as much with regards to the pf usage. > We've indeed been thinking about moving to an active-active setup for some time already, guess it'll have to happen sooner rather than later :) > If you want my opinion based on the information I have so far, it's > this: you need to scale your infrastructure. You can no longer rely on > a single machine to handle this amount of traffic. > > As for the network errors you see -- to get low-level NIC and driver > statistics, you'll need to run "sysctl dev.igb.X.stats=1" then run > "dmesg" and look at the numbers shown (the sysctl command won't output > anything itself). This may help indicate where the packets are being > lost. You should also check the interface counters on the switch which > these interfaces are connected to. I sure hope it's a managed switch > which can give you those statistics. > > Hope this helps, or at least acts as food for thought. > Aye, will try that. We're also considering moving to faster machines but I don't think that will help much with our problem. I suppose additional CPU cores will be of no help at all, considering the kernel is single threaded and runs on cpu0 only ? Actually, I assume it might even be detrimental to us to add more cores, since they'll spend more time interrupting each other ? Thanks for sharing your thoughts :)