From owner-freebsd-arch@FreeBSD.ORG Sat Jan 4 03:45:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59BB5201; Sat, 4 Jan 2014 03:45:36 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 424FC1CFA; Sat, 4 Jan 2014 03:45:36 +0000 (UTC) Received: from xp5k.my.domain (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id s043jYPD016976; Sat, 4 Jan 2014 03:45:35 GMT (envelope-from listlog2011@gmail.com) Message-ID: <52C783DE.1060102@gmail.com> Date: Sat, 04 Jan 2014 11:45:34 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? References: <52C77DB8.5020305@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: David Xu , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: davidxu@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 03:45:36 -0000 On 2014/01/04 11:25, Adrian Chadd wrote: > Doesn't critical_enter / exit enable/disable interrupts? > > We don't necessarily want to do -that-, as that can be expensive. Just > not scheduling certain tasks that would interfere would be good > enough. > > > -a Does critical_enter disable interrupts ? Long time ago, I saw it does not. If I remembered it correctly, spinlock_enter disables interrupt, critical_enter still allows interrupt, but current thread can not be preempted, it is deferred. > > On 3 January 2014 19:19, David Xu wrote: >> On 2014/01/04 08:55, Adrian Chadd wrote: >>> Hi, >>> >>> So here's a fun one. >>> >>> When doing TCP traffic + socket affinity + thread pinning experiments, >>> I seem to hit this very annoying scenario that caps my performance and >>> scalability. >>> >>> Assume I've lined up everything relating to a socket to run on the >>> same CPU (ie, TX, RX, TCP timers, userland thread): >>> >>> * userland code calls something, let's say "kqueue" >>> * the kqueue lock gets grabbed >>> * an interrupt comes in for the NIC >>> * the NIC code runs some RX code, and eventually hits something that >>> wants to push a knote up >>> * and the knote is for the same kqueue above >>> * .. so it grabs the lock.. >>> * .. contests.. >>> * Then the scheduler flips us back to the original userland thread doing TX >>> * The userland thread finishes its kqueue manipulation and releases >>> the queue lock >>> * .. the scheduler then immediately flips back to the NIC thread >>> waiting for the lock, grabs the lock, does a bit of work, then >>> releases the lock >>> >>> I see this on kqueue locks, sendfile locks (for sendfile notification) >>> and vm locks (for the VM page referencing/dereferencing.) >>> >>> This happens very frequently. It's very noticable with large numbers >>> of sockets as the chances of hitting a lock in the NIC RX path that >>> overlaps with something in the userland TX path that you are currently >>> fiddling with (eg kqueue manipulation) or sending data (eg vm_page >>> locks or sendfile locks for things you're currently transmitting) is >>> very high. As I increase traffic and the number of sockets, the amount >>> of context switches goes way up (to 300,000+) and the lock contention >>> / time spent doing locking is non-trivial. >>> >>> Linux doesn't "have this" problem - the lock primitives let you >>> disable driver bottom halves. So, in this instance, I'd just grab the >>> lock with spin_lock_bh() and all the driver bottom halves would not be >>> run. I'd thus not have this scheduler ping-ponging and lock contention >>> as it'd never get a chance to happen. >>> >>> So, does anyone have any ideas? Has anyone seen this? Shall we just >>> implement a way of doing selective thread disabling, a la >>> spin_lock_bh() mixed with spl${foo}() style stuff? >>> >>> Thanks, >>> >>> >>> -adrian >>> >> This is how turnstile based mutex works, AFAIK it is for realtime, >> same as POSIX pthread priority inheritance mutex, realtime does not >> mean high performance, in fact, it introduces more context switches >> and hurts throughput. I think default mutex could be patched to >> call critical_enter when mutex_lock is called, and spin forever, >> and call critical_leave when the mutex is unlocked, bypass turnstile. >> The turnstile design assumes the whole system must be scheduled >> on global thread priority, but who did say a system must be based on this? >> Recently, I had ported Linux CFS like scheduler to FreeBSD on our >> perforce server, >> it is based on start-time fair queue, and I found turnstile is such a >> bad thing. >> it makes me can not schedule thread based on class: rt > timeshare > idle, >> but must face with a global thread priority change. >> I have stopped porting it, although it is now fully work on UP, it supports >> nested group scheduling, I can watch video smoothly while doing "make >> -j10 buildwork" on >> same UP machine. My scheduler does not work on SMP, too much priority >> propagation >> work makes me go away, non-preemption spinlock works well for such >> a system, propagating thread weight on a scheduler tree is not practical. >> >> Regards, >> David Xu >> >> >> >> >> >> >> >> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"