Date: Fri, 3 Jan 2014 19:25:21 -0800 From: Adrian Chadd <adrian@freebsd.org> To: David Xu <davidxu@freebsd.org> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? Message-ID: <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com> In-Reply-To: <52C77DB8.5020305@gmail.com> References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> <52C77DB8.5020305@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Doesn't critical_enter / exit enable/disable interrupts? We don't necessarily want to do -that-, as that can be expensive. Just not scheduling certain tasks that would interfere would be good enough. -a On 3 January 2014 19:19, David Xu <listlog2011@gmail.com> wrote: > On 2014/01/04 08:55, Adrian Chadd wrote: >> Hi, >> >> So here's a fun one. >> >> When doing TCP traffic + socket affinity + thread pinning experiments, >> I seem to hit this very annoying scenario that caps my performance and >> scalability. >> >> Assume I've lined up everything relating to a socket to run on the >> same CPU (ie, TX, RX, TCP timers, userland thread): >> >> * userland code calls something, let's say "kqueue" >> * the kqueue lock gets grabbed >> * an interrupt comes in for the NIC >> * the NIC code runs some RX code, and eventually hits something that >> wants to push a knote up >> * and the knote is for the same kqueue above >> * .. so it grabs the lock.. >> * .. contests.. >> * Then the scheduler flips us back to the original userland thread doing TX >> * The userland thread finishes its kqueue manipulation and releases >> the queue lock >> * .. the scheduler then immediately flips back to the NIC thread >> waiting for the lock, grabs the lock, does a bit of work, then >> releases the lock >> >> I see this on kqueue locks, sendfile locks (for sendfile notification) >> and vm locks (for the VM page referencing/dereferencing.) >> >> This happens very frequently. It's very noticable with large numbers >> of sockets as the chances of hitting a lock in the NIC RX path that >> overlaps with something in the userland TX path that you are currently >> fiddling with (eg kqueue manipulation) or sending data (eg vm_page >> locks or sendfile locks for things you're currently transmitting) is >> very high. As I increase traffic and the number of sockets, the amount >> of context switches goes way up (to 300,000+) and the lock contention >> / time spent doing locking is non-trivial. >> >> Linux doesn't "have this" problem - the lock primitives let you >> disable driver bottom halves. So, in this instance, I'd just grab the >> lock with spin_lock_bh() and all the driver bottom halves would not be >> run. I'd thus not have this scheduler ping-ponging and lock contention >> as it'd never get a chance to happen. >> >> So, does anyone have any ideas? Has anyone seen this? Shall we just >> implement a way of doing selective thread disabling, a la >> spin_lock_bh() mixed with spl${foo}() style stuff? >> >> Thanks, >> >> >> -adrian >> > > This is how turnstile based mutex works, AFAIK it is for realtime, > same as POSIX pthread priority inheritance mutex, realtime does not > mean high performance, in fact, it introduces more context switches > and hurts throughput. I think default mutex could be patched to > call critical_enter when mutex_lock is called, and spin forever, > and call critical_leave when the mutex is unlocked, bypass turnstile. > The turnstile design assumes the whole system must be scheduled > on global thread priority, but who did say a system must be based on this? > Recently, I had ported Linux CFS like scheduler to FreeBSD on our > perforce server, > it is based on start-time fair queue, and I found turnstile is such a > bad thing. > it makes me can not schedule thread based on class: rt > timeshare > idle, > but must face with a global thread priority change. > I have stopped porting it, although it is now fully work on UP, it supports > nested group scheduling, I can watch video smoothly while doing "make > -j10 buildwork" on > same UP machine. My scheduler does not work on SMP, too much priority > propagation > work makes me go away, non-preemption spinlock works well for such > a system, propagating thread weight on a scheduler tree is not practical. > > Regards, > David Xu > > > > > > > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q>