Date: Fri, 3 Jan 2014 16:55:48 -0800 From: Adrian Chadd <adrian@freebsd.org> To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Acquiring a lock on the same CPU that holds it - what can be done? Message-ID: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, So here's a fun one. When doing TCP traffic + socket affinity + thread pinning experiments, I seem to hit this very annoying scenario that caps my performance and scalability. Assume I've lined up everything relating to a socket to run on the same CPU (ie, TX, RX, TCP timers, userland thread): * userland code calls something, let's say "kqueue" * the kqueue lock gets grabbed * an interrupt comes in for the NIC * the NIC code runs some RX code, and eventually hits something that wants to push a knote up * and the knote is for the same kqueue above * .. so it grabs the lock.. * .. contests.. * Then the scheduler flips us back to the original userland thread doing TX * The userland thread finishes its kqueue manipulation and releases the queue lock * .. the scheduler then immediately flips back to the NIC thread waiting for the lock, grabs the lock, does a bit of work, then releases the lock I see this on kqueue locks, sendfile locks (for sendfile notification) and vm locks (for the VM page referencing/dereferencing.) This happens very frequently. It's very noticable with large numbers of sockets as the chances of hitting a lock in the NIC RX path that overlaps with something in the userland TX path that you are currently fiddling with (eg kqueue manipulation) or sending data (eg vm_page locks or sendfile locks for things you're currently transmitting) is very high. As I increase traffic and the number of sockets, the amount of context switches goes way up (to 300,000+) and the lock contention / time spent doing locking is non-trivial. Linux doesn't "have this" problem - the lock primitives let you disable driver bottom halves. So, in this instance, I'd just grab the lock with spin_lock_bh() and all the driver bottom halves would not be run. I'd thus not have this scheduler ping-ponging and lock contention as it'd never get a chance to happen. So, does anyone have any ideas? Has anyone seen this? Shall we just implement a way of doing selective thread disabling, a la spin_lock_bh() mixed with spl${foo}() style stuff? Thanks, -adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA>