Date: Fri, 8 Jun 2012 10:17:37 +0400 From: Gleb Smirnoff <glebius@FreeBSD.org> To: pf@FreeBSD.org Cc: net@FreeBSD.org Subject: [CFT] SMP-friendly pf Message-ID: <20120608061737.GA28197@glebius.int.ru>
next in thread | raw e-mail | index | archive | help
Hello, networkers! [net@ in Cc, but further discussion should go on pf@] As you already probably know, or some may be don't yet know, the pf(4) subsystem in FreeBSD is currently working under a single mutex. This mutex is acquired right at the beginning of any packet processing, and is dropped at the end. While one thread is in pf(4) all other threads are blocked on that mutex. Meanwhile modern computers are getting more and more cores, and modern network cards getting more MSI interrupts, each serviced by a separate kernel thread in FreeBSD. So the single pf lock, which I call "the pf Giant" :), is getting a point of hard contention. Three and a half months ago I've started on a project "SMP-friendly pf", which recently have entered alpha stage. As you see from the subject of this mail, this is call for testing. Willing to test? The code lives in projects/pf/head branch in the SVN, and can be checked out with: svn checkout http://svn.freebsd.org/base/projects/pf/head pflock , where argument "pflock" is just directory name for checked out sources. Then you need to build world and kernel from that branch and install them. The branch projects/pf/head gets head merged to it quite often, so if you run head world with a revision equal (or at least close) to last merge, then you don't need to install world, however rebuilding pfctl and snmp_pf from that branch is necessary. If you are about to run this alpha pf on any important box, then you definitely need to establish safety measures: have a second box running stable/9 or head as carp(4) backup, ready to kick in, in case if new pf panics. pfsync(4) connection should also be established between new and backup boxes. pfsync(4) in the new code is wire compatible with stable/9 or head. I'm already running it on routers with 100k - 200k state entries, and forwarding 20k - 40k pps. If you are brave, you should try, too :) Good luck and report any problems to me! Interested in details? From the very beginning of the project it was clear, that code is going to diverge significantly from original OpenBSD code. OpenBSD has always developed pf without taking into account that code can ever get multithreaded, thus quite a lot needed to be changed. Thus, I've started with removing the "#ifdef __FreeBSD__" from the code, and later I didn't hesitate even a fraction of second if I wanted to toss some code. The pros is that now code is much more readable and understandible then in head, the cons is that diff between us and OpenBSD is huge, although amount of shared code is huge, too. So, later on only manual merging of features from OpenBSD is possible and bulk imports of entire pf into FreeBSD are no longer possible. The locking scheme is the following: - There is an rwlock(9) that protects rules and all kind of data that isn't modified by forwarding threads. Forwarding threads reader lock it, ioctl() and other reconfiguring events write lock it. - The states and key states storage had moved from RB-trees to hashes, with separate mutexes per hash slot. This should give us decent parallelism when forwarding packets. - Source nodes storage moved to hash with per-slot locking. - pfsync(4) got separate mutex. - fragment reassembly got separate mutex. Apart from the above key changes, many other optimisations and fixes done. The entire diff is 22k lines large. You can view the projects history here: http://svnweb.freebsd.org/base/projects/pf/head/?view=log (the beginning is on page 2 now, at r232042) I had tried to make informative commit messages. -- Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120608061737.GA28197>