Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jan 2014 19:25:21 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        David Xu <davidxu@freebsd.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: Acquiring a lock on the same CPU that holds it - what can be done?
Message-ID:  <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>
In-Reply-To: <52C77DB8.5020305@gmail.com>
References:  <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> <52C77DB8.5020305@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Doesn't critical_enter / exit enable/disable interrupts?

We don't necessarily want to do -that-, as that can be expensive. Just
not scheduling certain tasks that would interfere would be good
enough.


-a


On 3 January 2014 19:19, David Xu <listlog2011@gmail.com> wrote:
> On 2014/01/04 08:55, Adrian Chadd wrote:
>> Hi,
>>
>> So here's a fun one.
>>
>> When doing TCP traffic + socket affinity + thread pinning experiments,
>> I seem to hit this very annoying scenario that caps my performance and
>> scalability.
>>
>> Assume I've lined up everything relating to a socket to run on the
>> same CPU (ie, TX, RX, TCP timers, userland thread):
>>
>> * userland code calls something, let's say "kqueue"
>> * the kqueue lock gets grabbed
>> * an interrupt comes in for the NIC
>> * the NIC code runs some RX code, and eventually hits something that
>> wants to push a knote up
>> * and the knote is for the same kqueue above
>> * .. so it grabs the lock..
>> * .. contests..
>> * Then the scheduler flips us back to the original userland thread doing TX
>> * The userland thread finishes its kqueue manipulation and releases
>> the queue lock
>> * .. the scheduler then immediately flips back to the NIC thread
>> waiting for the lock, grabs the lock, does a bit of work, then
>> releases the lock
>>
>> I see this on kqueue locks, sendfile locks (for sendfile notification)
>> and vm locks (for the VM page referencing/dereferencing.)
>>
>> This happens very frequently. It's very noticable with large numbers
>> of sockets as the chances of hitting a lock in the NIC RX path that
>> overlaps with something in the userland TX path that you are currently
>> fiddling with (eg kqueue manipulation) or sending data (eg vm_page
>> locks or sendfile locks for things you're currently transmitting) is
>> very high. As I increase traffic and the number of sockets, the amount
>> of context switches goes way up (to 300,000+) and the lock contention
>> / time spent doing locking is non-trivial.
>>
>> Linux doesn't "have this" problem - the lock primitives let you
>> disable driver bottom halves. So, in this instance, I'd just grab the
>> lock with spin_lock_bh() and all the driver bottom halves would not be
>> run. I'd thus not have this scheduler ping-ponging and lock contention
>> as it'd never get a chance to happen.
>>
>> So, does anyone have any ideas? Has anyone seen this? Shall we just
>> implement a way of doing selective thread disabling, a la
>> spin_lock_bh() mixed with spl${foo}() style stuff?
>>
>> Thanks,
>>
>>
>> -adrian
>>
>
> This is how turnstile based mutex works, AFAIK it is for realtime,
> same as POSIX pthread priority inheritance mutex,  realtime does not
> mean high performance, in fact, it introduces more context switches
> and hurts throughput. I think default mutex could be patched to
> call critical_enter when mutex_lock is called, and spin forever,
> and call critical_leave when the mutex is unlocked, bypass turnstile.
> The turnstile design assumes the whole system must be scheduled
> on global thread priority, but who did say a system must be based on this?
> Recently, I had ported Linux CFS like scheduler to FreeBSD on our
> perforce server,
> it is based on start-time fair queue, and I found turnstile is such a
> bad thing.
> it makes me can not schedule thread based on class: rt > timeshare > idle,
> but must face with a global thread priority change.
> I have stopped porting it, although it is now fully work on UP, it supports
> nested group scheduling, I can watch video smoothly while doing "make
> -j10 buildwork" on
> same UP machine. My scheduler does not work on SMP, too much priority
> propagation
> work makes me go away, non-preemption spinlock works well for such
> a system,  propagating thread weight on a scheduler tree is not practical.
>
> Regards,
> David Xu
>
>
>
>
>
>
>
>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q>