Date: Fri, 6 Jul 2012 08:11:26 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: "Alexander V. Chernikov" <melifaro@FreeBSD.org> Cc: Doug Barton <dougb@freebsd.org>, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel Message-ID: <20120706061126.GA65432@onelab2.iet.unipi.it> In-Reply-To: <4FF59955.5090406@FreeBSD.org> References: <4FF361CA.4000506@FreeBSD.org> <20120703214419.GC92445@onelab2.iet.unipi.it> <4FF36438.2030902@FreeBSD.org> <4FF3E2C4.7050701@FreeBSD.org> <4FF3FB14.8020006@FreeBSD.org> <4FF402D1.4000505@FreeBSD.org> <20120704091241.GA99164@onelab2.iet.unipi.it> <4FF412B9.3000406@FreeBSD.org> <20120704154856.GC3680@onelab2.iet.unipi.it> <4FF59955.5090406@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov wrote: > On 04.07.2012 19:48, Luigi Rizzo wrote: ... > Traffic stats with most possible counters eliminated: > (there is a possibility in ixgbe code to update rx/tx packets once per > rx_process_limit (which is 100 by default)): > > input (ix0) output > packets errs idrops bytes packets errs bytes colls > 2.8M 0 0 186M 2.8M 0 186M 0 > 2.8M 0 0 187M 2.8M 0 186M 0 > > And it seems that netstat uses 1024 as divisor (no HN_DIVISOR_1000 > passed in if.c to show_stat), so real frame count from Ixia side is much > closer to 3MPPS (~ 2.961600 ). ... > IPFW contention: > Same setup as shown upper, same traffic level > > 17:48 [0] test15# ipfw show > 00100 0 0 allow ip from any to any > 65535 0 0 deny ip from any to any > > net.inet.ip.fw.enable: 0 -> 1 > input (ix0) output > packets errs idrops bytes packets errs bytes colls > 2.1M 734k 0 187M 2.1M 0 139M 0 > 2.1M 736k 0 187M 2.1M 0 139M 0 > 2.1M 737k 0 187M 2.1M 0 89M 0 > 2.1M 735k 0 187M 2.1M 0 189M 0 > net.inet.ip.fw.update_counters: 1 -> 0 > 2.3M 636k 0 187M 2.3M 0 148M 0 > 2.5M 343k 0 187M 2.5M 0 164M 0 > 2.5M 351k 0 187M 2.5M 0 164M 0 > 2.5M 345k 0 187M 2.5M 0 164M 0 ... > It seems that ipfw counters are suffering from this problem, too. > Unfortunately, there is no DPCPU allocator in our kernel. > I'm planning to make a very simple per-cpu counters patch: > ( > allocate 65k*(u64_bytes+u64_packets) memory for each CPU per vnet > instance init and make ipfw use it as counter backend. > > There is a problem with several rules residing in single entry. This can > (probably) be worked-around by using fast counters for the first such > rule (or not using fast counters for such rules at all) > ) > > What do you think about this? the thing discussed a few years ago (at least the one i took out of the discussion) was that the counter fields in rules should hold the index of a per-cpu counter associated to the rule. So CTR_INC(rule->ctr) becomes something like pcpu->ipfw_ctrs[rule->ctr]++ Once you create a new rule you also grab one free index from ipfw_ctrs[], and the same should go for dummynet counters. The alternative would be to allocate the rule and a set of counters within the rule itself, but that kills 64 bytes per core per rule to avoid cache contention. cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120706061126.GA65432>