From owner-freebsd-net@FreeBSD.ORG Fri Jun 8 10:39:45 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50BDF1065672; Fri, 8 Jun 2012 10:39:45 +0000 (UTC) (envelope-from ermal.luci@gmail.com) Received: from mail-gg0-f182.google.com (mail-gg0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id E6FB38FC14; Fri, 8 Jun 2012 10:39:44 +0000 (UTC) Received: by ggnm2 with SMTP id m2so1300430ggn.13 for ; Fri, 08 Jun 2012 03:39:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=UqWBRf8NXOxX6AQmXIwKZa64kdwGdY6Ti0JvKj/Rl+M=; b=L0PNZVN9+fNCeAgAAqepw5tIPsKDHwEE+4q/bg79ukggFLyfVUTfYCSBQai9MtHB84 RJO3kU3vy1avj/IKkYL+YGIFl+hzXRO88cFVx9LrzjAGfxb2tvn5MK27tYH3K60efbUl bOFWYJx7ew8UJxh+BeeIVtfs0MLjbnkrCBVJwWDo/CoylAtRGzHyqHrWGYzR42Oh58AC A4PHX2ZiEqSKoP9mHofppN49fiHM8lQahIy8Xx6uqyv8LX9ks13bEUEx+id+MJlEVXrb dUsxDnJwnAA0TTnFLTJsUaq7Cx7bEKPk47umE+nyRZG9tHEVFm7kLue5R0unumbZAUY6 ULWw== MIME-Version: 1.0 Received: by 10.50.51.132 with SMTP id k4mr2659461igo.17.1339151984072; Fri, 08 Jun 2012 03:39:44 -0700 (PDT) Sender: ermal.luci@gmail.com Received: by 10.231.35.202 with HTTP; Fri, 8 Jun 2012 03:39:43 -0700 (PDT) In-Reply-To: <20120608061737.GA28197@glebius.int.ru> References: <20120608061737.GA28197@glebius.int.ru> Date: Fri, 8 Jun 2012 12:39:43 +0200 X-Google-Sender-Auth: CxwOsRYjloEtZn2JrSrr4FqjHws Message-ID: From: =?ISO-8859-1?Q?Ermal_Lu=E7i?= To: pf@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: net@freebsd.org Subject: Re: [CFT] SMP-friendly pf X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jun 2012 10:39:45 -0000 On Fri, Jun 8, 2012 at 8:17 AM, Gleb Smirnoff wrote: > =A0Hello, networkers! > > =A0[net@ in Cc, but further discussion should go on pf@] > > =A0As you already probably know, or some may be don't yet know, the pf(4) > subsystem in FreeBSD is currently working under a single mutex. This mute= x > is acquired right at the beginning of any packet processing, and is dropp= ed > at the end. While one thread is in pf(4) all other threads are blocked on > that mutex. > > =A0Meanwhile modern computers are getting more and more cores, and modern > network cards getting more MSI interrupts, each serviced by a separate ke= rnel > thread in FreeBSD. So the single pf lock, which I call "the pf Giant" :),= is > getting a point of hard contention. > > =A0Three and a half months ago I've started on a project "SMP-friendly pf= ", > which recently have entered alpha stage. As you see from the subject of t= his > mail, this is call for testing. > > > =A0Willing to test? > As i already asked in private wihtout a documentation/schema describing how you protect the various elements in pf(4) this is very hard to review. - What do you do to allow correctness on statistics? - What do you with tables protection, are they under same lock as rules...? - How is if-bound versus floating states maintained? - What is protecting scrub ruleset? - What is protecting nat ruleset? -.... - How you solved synproxy ? Is it scalable? - Do you think you have introduced possiblity of security issues with taskqueues you introduce? There are many how? in this implementation that are difficult to see without you telling! > =A0The code lives in projects/pf/head branch in the SVN, and can be check= ed > out with: > > =A0svn checkout http://svn.freebsd.org/base/projects/pf/head pflock > > , where argument "pflock" is just directory name for checked out sources. > =A0Then you need to build world and kernel from that branch and install t= hem. > The branch projects/pf/head gets head merged to it quite often, so if you > run head world with a revision equal (or at least close) to last merge, t= hen > you don't need to install world, however rebuilding pfctl and snmp_pf fro= m > that branch is necessary. > =A0If you are about to run this alpha pf on any important box, then you > definitely need to establish safety measures: have a second box running > stable/9 or head as carp(4) backup, ready to kick in, in case if new pf > panics. pfsync(4) connection should also be established between new and > backup boxes. pfsync(4) in the new code is wire compatible with stable/9 > or head. > =A0I'm already running it on routers with 100k - 200k state entries, and > forwarding 20k - 40k pps. If you are brave, you should try, too :) Good > luck and report any problems to me! > > > =A0Interested in details? > > =A0From the very beginning of the project it was clear, that code is goin= g > to diverge significantly from original OpenBSD code. OpenBSD has always > developed pf without taking into account that code can ever get > multithreaded, thus quite a lot needed to be changed. Thus, I've started > with removing the "#ifdef __FreeBSD__" from the code, and later I didn't > hesitate even a fraction of second if I wanted to toss some code. The pro= s > is that now code is much more readable and understandible then in head, > the cons is that diff between us and OpenBSD is huge, although amount > of shared code is huge, too. So, later on only manual merging of features > from OpenBSD is possible and bulk imports of entire pf into FreeBSD are > no longer possible. > > =A0The locking scheme is the following: > - There is an rwlock(9) that protects rules and all kind of data that isn= 't > =A0modified by forwarding threads. Forwarding threads reader lock it, ioc= tl() > =A0and other reconfiguring events write lock it. > - The states and key states storage had moved from RB-trees to hashes, wi= th > =A0separate mutexes per hash slot. This should give us decent parallelism > =A0when forwarding packets. > - Source nodes storage moved to hash with per-slot locking. > - pfsync(4) got separate mutex. > - fragment reassembly got separate mutex. > > =A0Apart from the above key changes, many other optimisations and fixes d= one. > The entire diff is 22k lines large. You can view the projects history her= e: > > http://svnweb.freebsd.org/base/projects/pf/head/?view=3Dlog > > (the beginning is on page 2 now, at r232042) I had tried to make informat= ive > commit messages. > > -- > Totus tuus, Glebius. > _______________________________________________ > freebsd-pf@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-pf > To unsubscribe, send any mail to "freebsd-pf-unsubscribe@freebsd.org" --=20 Ermal