Date: Sat, 04 Jan 2014 11:19:20 +0800 From: David Xu <listlog2011@gmail.com> To: freebsd-arch@freebsd.org Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? Message-ID: <52C77DB8.5020305@gmail.com> In-Reply-To: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2014/01/04 08:55, Adrian Chadd wrote: > Hi, > > So here's a fun one. > > When doing TCP traffic + socket affinity + thread pinning experiments, > I seem to hit this very annoying scenario that caps my performance and > scalability. > > Assume I've lined up everything relating to a socket to run on the > same CPU (ie, TX, RX, TCP timers, userland thread): > > * userland code calls something, let's say "kqueue" > * the kqueue lock gets grabbed > * an interrupt comes in for the NIC > * the NIC code runs some RX code, and eventually hits something that > wants to push a knote up > * and the knote is for the same kqueue above > * .. so it grabs the lock.. > * .. contests.. > * Then the scheduler flips us back to the original userland thread doing TX > * The userland thread finishes its kqueue manipulation and releases > the queue lock > * .. the scheduler then immediately flips back to the NIC thread > waiting for the lock, grabs the lock, does a bit of work, then > releases the lock > > I see this on kqueue locks, sendfile locks (for sendfile notification) > and vm locks (for the VM page referencing/dereferencing.) > > This happens very frequently. It's very noticable with large numbers > of sockets as the chances of hitting a lock in the NIC RX path that > overlaps with something in the userland TX path that you are currently > fiddling with (eg kqueue manipulation) or sending data (eg vm_page > locks or sendfile locks for things you're currently transmitting) is > very high. As I increase traffic and the number of sockets, the amount > of context switches goes way up (to 300,000+) and the lock contention > / time spent doing locking is non-trivial. > > Linux doesn't "have this" problem - the lock primitives let you > disable driver bottom halves. So, in this instance, I'd just grab the > lock with spin_lock_bh() and all the driver bottom halves would not be > run. I'd thus not have this scheduler ping-ponging and lock contention > as it'd never get a chance to happen. > > So, does anyone have any ideas? Has anyone seen this? Shall we just > implement a way of doing selective thread disabling, a la > spin_lock_bh() mixed with spl${foo}() style stuff? > > Thanks, > > > -adrian > This is how turnstile based mutex works, AFAIK it is for realtime, same as POSIX pthread priority inheritance mutex, realtime does not mean high performance, in fact, it introduces more context switches and hurts throughput. I think default mutex could be patched to call critical_enter when mutex_lock is called, and spin forever, and call critical_leave when the mutex is unlocked, bypass turnstile. The turnstile design assumes the whole system must be scheduled on global thread priority, but who did say a system must be based on this? Recently, I had ported Linux CFS like scheduler to FreeBSD on our perforce server, it is based on start-time fair queue, and I found turnstile is such a bad thing. it makes me can not schedule thread based on class: rt > timeshare > idle, but must face with a global thread priority change. I have stopped porting it, although it is now fully work on UP, it supports nested group scheduling, I can watch video smoothly while doing "make -j10 buildwork" on same UP machine. My scheduler does not work on SMP, too much priority propagation work makes me go away, non-preemption spinlock works well for such a system, propagating thread weight on a scheduler tree is not practical. Regards, David Xu
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52C77DB8.5020305>