From owner-freebsd-arch@FreeBSD.ORG  Sat Jan  4 03:45:36 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 59BB5201;
 Sat,  4 Jan 2014 03:45:36 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 424FC1CFA;
 Sat,  4 Jan 2014 03:45:36 +0000 (UTC)
Received: from xp5k.my.domain (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s043jYPD016976;
 Sat, 4 Jan 2014 03:45:35 GMT (envelope-from listlog2011@gmail.com)
Message-ID: <52C783DE.1060102@gmail.com>
Date: Sat, 04 Jan 2014 11:45:34 +0800
From: David Xu <listlog2011@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done?
References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>
 <52C77DB8.5020305@gmail.com>
 <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>
In-Reply-To: <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: David Xu <davidxu@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: davidxu@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jan 2014 03:45:36 -0000

On 2014/01/04 11:25, Adrian Chadd wrote:
> Doesn't critical_enter / exit enable/disable interrupts?
>
> We don't necessarily want to do -that-, as that can be expensive. Just
> not scheduling certain tasks that would interfere would be good
> enough.
>
>
> -a

Does critical_enter disable interrupts ? Long time ago, I saw it does
not. If I remembered it correctly, spinlock_enter disables interrupt,
critical_enter still allows interrupt, but current thread can not be
preempted, it is deferred.


>
> On 3 January 2014 19:19, David Xu <listlog2011@gmail.com> wrote:
>> On 2014/01/04 08:55, Adrian Chadd wrote:
>>> Hi,
>>>
>>> So here's a fun one.
>>>
>>> When doing TCP traffic + socket affinity + thread pinning experiments,
>>> I seem to hit this very annoying scenario that caps my performance and
>>> scalability.
>>>
>>> Assume I've lined up everything relating to a socket to run on the
>>> same CPU (ie, TX, RX, TCP timers, userland thread):
>>>
>>> * userland code calls something, let's say "kqueue"
>>> * the kqueue lock gets grabbed
>>> * an interrupt comes in for the NIC
>>> * the NIC code runs some RX code, and eventually hits something that
>>> wants to push a knote up
>>> * and the knote is for the same kqueue above
>>> * .. so it grabs the lock..
>>> * .. contests..
>>> * Then the scheduler flips us back to the original userland thread doing TX
>>> * The userland thread finishes its kqueue manipulation and releases
>>> the queue lock
>>> * .. the scheduler then immediately flips back to the NIC thread
>>> waiting for the lock, grabs the lock, does a bit of work, then
>>> releases the lock
>>>
>>> I see this on kqueue locks, sendfile locks (for sendfile notification)
>>> and vm locks (for the VM page referencing/dereferencing.)
>>>
>>> This happens very frequently. It's very noticable with large numbers
>>> of sockets as the chances of hitting a lock in the NIC RX path that
>>> overlaps with something in the userland TX path that you are currently
>>> fiddling with (eg kqueue manipulation) or sending data (eg vm_page
>>> locks or sendfile locks for things you're currently transmitting) is
>>> very high. As I increase traffic and the number of sockets, the amount
>>> of context switches goes way up (to 300,000+) and the lock contention
>>> / time spent doing locking is non-trivial.
>>>
>>> Linux doesn't "have this" problem - the lock primitives let you
>>> disable driver bottom halves. So, in this instance, I'd just grab the
>>> lock with spin_lock_bh() and all the driver bottom halves would not be
>>> run. I'd thus not have this scheduler ping-ponging and lock contention
>>> as it'd never get a chance to happen.
>>>
>>> So, does anyone have any ideas? Has anyone seen this? Shall we just
>>> implement a way of doing selective thread disabling, a la
>>> spin_lock_bh() mixed with spl${foo}() style stuff?
>>>
>>> Thanks,
>>>
>>>
>>> -adrian
>>>
>> This is how turnstile based mutex works, AFAIK it is for realtime,
>> same as POSIX pthread priority inheritance mutex,  realtime does not
>> mean high performance, in fact, it introduces more context switches
>> and hurts throughput. I think default mutex could be patched to
>> call critical_enter when mutex_lock is called, and spin forever,
>> and call critical_leave when the mutex is unlocked, bypass turnstile.
>> The turnstile design assumes the whole system must be scheduled
>> on global thread priority, but who did say a system must be based on this?
>> Recently, I had ported Linux CFS like scheduler to FreeBSD on our
>> perforce server,
>> it is based on start-time fair queue, and I found turnstile is such a
>> bad thing.
>> it makes me can not schedule thread based on class: rt > timeshare > idle,
>> but must face with a global thread priority change.
>> I have stopped porting it, although it is now fully work on UP, it supports
>> nested group scheduling, I can watch video smoothly while doing "make
>> -j10 buildwork" on
>> same UP machine. My scheduler does not work on SMP, too much priority
>> propagation
>> work makes me go away, non-preemption spinlock works well for such
>> a system,  propagating thread weight on a scheduler tree is not practical.
>>
>> Regards,
>> David Xu
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"