From owner-freebsd-net@FreeBSD.ORG Tue Jul 17 08:38:29 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by hub.freebsd.org (Postfix) with ESMTP id 16B6E106564A; Tue, 17 Jul 2012 08:38:29 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id D173514DC0D; Tue, 17 Jul 2012 08:38:27 +0000 (UTC) Message-ID: <50052419.7010601@FreeBSD.org> Date: Tue, 17 Jul 2012 12:36:41 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120511 Thunderbird/12.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <4FF36438.2030902@FreeBSD.org> <4FF3E2C4.7050701@FreeBSD.org> <4FF3FB14.8020006@FreeBSD.org> <4FF402D1.4000505@FreeBSD.org> <20120704091241.GA99164@onelab2.iet.unipi.it> <4FF412B9.3000406@FreeBSD.org> <20120704154856.GC3680@onelab2.iet.unipi.it> <4FF59955.5090406@FreeBSD.org> <20120706061126.GA65432@onelab2.iet.unipi.it> <500452A5.3070501@FreeBSD.org> <20120716212249.GA14607@onelab2.iet.unipi.it> In-Reply-To: <20120716212249.GA14607@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Doug Barton , net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jul 2012 08:38:29 -0000 On 17.07.2012 01:22, Luigi Rizzo wrote: > On Mon, Jul 16, 2012 at 09:43:01PM +0400, Alexander V. Chernikov wrote: >> On 06.07.2012 10:11, Luigi Rizzo wrote: >>> On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov wrote: >>>> On 04.07.2012 19:48, Luigi Rizzo wrote: > well, it seems that the counters are costing some 10% which is > not negligible (60ns per packet according to your test). > Also i'd be curious if you get better savings if you > have actual conflicts on the rulesets (e.g. what happens > with a ruleset that has, say, ten "count ip from any to any" rules) ? It is a bit difficult to get _exact_ performance numbers since 0.5% of linerate is ~ 70kpps, however 1.98 MPPS >> net.inet.ip.fw.update_counters=1 >> net.inet.ip.fw.enable=1 1.67 MPPS .. And here it is time to check ipfw rmlock performance another time, since we're acquiring recursive rmlock (pfil) and rwlock (ipfw) twice. input (ix0) output packets errs idrops bytes packets errs bytes colls 1664518 0 0 109910406 1664280 0 110055646 0 1664155 0 0 110018508 1664960 0 109921738 0 1663795 0 0 109839018 1664618 0 109965576 0 00100 count ip from any to any 1633118 22691 0 109539808 1621567 0 107402164 0 1625215 42836 0 110080554 1625638 0 107257950 0 1630848 34315 0 109932628 1631634 0 72449482 0 1613686 44167 0 109493942 1613811 0 142363996 0 1613387 53236 0 110075314 1614144 0 106479880 0 1611789 52348 0 109932904 1611600 0 106542318 0 1608327 56371 0 109947824 1608218 0 106229134 0 1615790 50527 0 110015368 1615528 0 106638914 0 1613453 50508 0 109872060 1614115 0 72118650 0 1614382 50955 0 109957958 1613808 0 141208126 0 1612053 54185 0 110002138 1611855 0 106490270 0 1538015 13138 0 102872260 1547403 0 102004436 0 1538084 0 0 101536936 1538034 0 66189600 0 1536305 0 0 101456028 1533714 0 101844506 0 1537533 0 0 101458596 1533775 0 101425338 0 00200 count ip from any to any 1529260 6840 0 101471016 1526825 0 100819632 0 1532496 5926 0 101540068 1534299 0 101292096 0 1532535 4412 0 101596090 1531906 0 101148828 0 1527551 9545 0 101488912 1527051 0 100957332 0 1538293 1523 0 101655604 1539942 0 101557938 0 1536673 0 0 84887698 1537473 0 69175150 0 1538330 0 0 118127042 1537831 0 134094698 0 .. 00300 count ip from any to any 1481474 0 0 97944746 1481326 0 67132158 0 1489185 2409 0 98413408 1480661 0 128797966 0 1476976 9604 0 98221830 1478449 0 97810444 0 1476286 12574 0 98464756 1479253 0 97638972 0 1483221 10545 0 98451838 1479954 0 97677840 0 So (very rought), First rule: 0.25-0.35MPPS (10-15% overall performance) Second rule: 0.05MPPS Third rule: 0.11MPPS (!) >> Additionally, I've got (from my previous pcpu attempt) a small patch >> permitting ipfw to re-use rule map allocation instead of reallocating >> on every rule. This saves a bit of system time: >> >> loading 20k rules with ipfw binary gives us: >> 5.1s system time before and 4.1s system time after. > > not bad but in this case i wonder if one wouldn't > get much higher savings by support multiple rule loading with a > single syscall. The format used in IPFW3 should help that. This seems to be more reasonable in general. So, skipping this patch? > > cheers > luigi >