From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 7 07:45:52 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A4781C2E; Wed, 7 Nov 2012 07:45:52 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 7EA558FC0C; Wed, 7 Nov 2012 07:45:52 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id qA77joD9020527; Wed, 7 Nov 2012 07:45:51 GMT (envelope-from davidxu@freebsd.org) Message-ID: <509A11B4.7030605@freebsd.org> Date: Wed, 07 Nov 2012 15:45:56 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:14.0) Gecko/20120822 Thunderbird/14.0 MIME-Version: 1.0 To: Jeff Roberson Subject: Re: ule+smp: small optimization for turnstile priority lending References: <50587F8D.9060102@FreeBSD.org> <505AD2A5.6060008@freebsd.org> <5099C5C9.4040703@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: attilio@freebsd.org, freebsd-hackers , Jeff Roberson , Andriy Gapon X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2012 07:45:52 -0000 On 2012/11/07 14:17, Jeff Roberson wrote: > On Wed, 7 Nov 2012, David Xu wrote: > >> On 2012/11/06 19:03, Attilio Rao wrote: >>> On 9/20/12, David Xu wrote: >>>> On 2012/09/18 22:05, Andriy Gapon wrote: >>>>> >>>>> Here is a snippet that demonstrates the issue on a supposedly fully >>>>> loaded >>>>> 2-processor system: >>>>> >>>>> 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid >>>>> 102818", >>>>> state:"running", attributes: prio:122 >>>>> >>>>> 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid >>>>> 111916", >>>>> state:"yielding", attributes: prio:183, wmesg:"(null)", >>>>> lockname:"(null)" >>>>> >>>>> 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 >>>>> tid >>>>> 100004", >>>>> state:"running", attributes: prio:255 >>>>> >>>>> 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", >>>>> counter:0, >>>>> attributes: none >>>>> >>>>> 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> state:"blocked", attributes: prio:122, wmesg:"(null)", >>>>> lockname:"unp_mtx" >>>>> >>>>> 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", >>>>> counter:2, >>>>> attributes: none >>>>> >>>>> 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> point:"wokeup", attributes: linkedto:"Xorg tid 102818" >>>>> >>>>> 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid >>>>> 102818", >>>>> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" >>>>> >>>>> 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", >>>>> counter:1, >>>>> attributes: none >>>>> >>>>> 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid >>>>> 102818", >>>>> state:"runq rem", attributes: prio:176 >>>>> >>>>> 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid >>>>> 102818", >>>>> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox >>>>> tid >>>>> 113473" >>>>> >>>>> 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> state:"running", attributes: prio:104 >>>>> >>>>> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it >>>>> preempted >>>>> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with >>>>> zero load. >>>>> >>>>> Here is a proposed solution: >>>>> >>>>> turnstile_wait: optimize priority lending to a thread on a >>>>> runqueue >>>>> >>>>> As the current thread is definitely going into mi_switch, it now >>>>> removes >>>>> its load before doing priority propagation which can potentially >>>>> result >>>>> in sched_add. In the SMP && ULE case the latter searches for >>>>> the >>>>> least loaded CPU to place a boosted thread, which is supposedly >>>>> about >>>>> to run. >>>>> >>>>> diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c >>>>> index 8e466cd..3299cae 100644 >>>>> --- a/sys/kern/sched_ule.c >>>>> +++ b/sys/kern/sched_ule.c >>>>> @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread >>>>> *newtd, int >>>>> flags) >>>>> /* This thread must be going to sleep. */ >>>>> TDQ_LOCK(tdq); >>>>> mtx = thread_lock_block(td); >>>>> - tdq_load_rem(tdq, td); >>>>> +#if defined(SMP) >>>>> + if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE) >>>>> +#endif >>>>> + tdq_load_rem(tdq, td); >>>>> } >>>>> /* >>>>> * We enter here with the thread blocked and assigned to the >>>>> @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) >>>>> tdq_setlowpri(tdq, NULL); >>>>> } >>>>> >>>>> +void >>>>> +sched_load_rem(struct thread *td) >>>>> +{ >>>>> + struct tdq *tdq; >>>>> + >>>>> + KASSERT(td == curthread, >>>>> + ("sched_rem_load: only curthread is supported")); >>>>> + KASSERT(td->td_oncpu == td->td_sched->ts_cpu, >>>>> + ("thread running on cpu different from ts_cpu")); >>>>> + tdq = TDQ_CPU(td->td_sched->ts_cpu); >>>>> + TDQ_LOCK_ASSERT(tdq, MA_OWNED); >>>>> + MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >>>>> + tdq_load_rem(tdq, td); >>>>> +} >>>>> + >>>>> /* >>>>> * Fetch cpu utilization information. Updates on demand. >>>>> */ >>>>> diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c >>>>> index 31d16fe..d1d68e9 100644 >>>>> --- a/sys/kern/subr_turnstile.c >>>>> +++ b/sys/kern/subr_turnstile.c >>>>> @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct >>>>> thread >>>>> *owner, >>>>> int queue) >>>>> LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash); >>>>> } >>>>> thread_lock(td); >>>>> +#if defined(SCHED_ULE) && defined(SMP) >>>>> + /* >>>>> + * Remove load earlier so that it does not affect cpu selection >>>>> + * for a thread waken up due to priority lending, if any. >>>>> + */ >>>>> + sched_load_rem(td); >>>>> +#endif >>>>> thread_lock_set(td, &ts->ts_lock); >>>>> td->td_turnstile = NULL; >>>>> >>>>> diff --git a/sys/sys/sched.h b/sys/sys/sched.h >>>>> index 4b8387c..b1ead1b 100644 >>>>> --- a/sys/sys/sched.h >>>>> +++ b/sys/sys/sched.h >>>>> @@ -110,6 +110,9 @@ void sched_preempt(struct thread *td); >>>>> void sched_add(struct thread *td, int flags); >>>>> void sched_clock(struct thread *td); >>>>> void sched_rem(struct thread *td); >>>>> +#if defined(SCHED_ULE) && defined(SMP) >>>>> +void sched_load_rem(struct thread *td); >>>>> +#endif >>>>> void sched_tick(int cnt); >>>>> void sched_relinquish(struct thread *td); >>>>> struct thread *sched_choose(void); >>>>> >>>> >>>> I found another scenario in taskqueue, in the function >>>> taskqueue_terminate, current thread tries to wake >>>> another thread up and sleep immediately, the tq_mutex sometimes >>>> is a spinlock. So if you remove one thread load from current cpu >>>> before wakeup, the resumed thread may be put on same cpu, >>>> so it will optimize the cpu scheduling too. >>> >>> I think that in order to fit with sched_add() modifies I have in mind >>> (see other patches within this thread) wakeup() should grow a new >>> argument, or maybe we can use wakeup_flags() new KPI. >>> If the latter is the case, I would also propose to let wakeup_one() to >>> be absorbed into wakeup_flags() with its own flag. >>> >> >> Yes, I like the idea. > >> From ~2007 > > http://people.freebsd.org/~jeff/wakeupflags.diff > > This has some different optimizations that can be done when you have a > hinted wakeup. > wakeup_flags is a good start point. But what flags should be supported? I think WAKEUP_WILLSLEEP may be needed if the current thread will switch out as soon as possible. I see you have added WAKEUP_LOCAL and WAKEUP_TAIL. Are they for cache optimization ? both of them are arguable. WAKEUP_LOCAL increases cpu migration, not good, since one benefit of ULE is keeping thread on its old cpu, the WAKEUP_LOCAL violates the design. WAKEUP_LOCAL used in pipe code may not be correct if the pipe only has few of bytes to be transfered or receiver only eats a few bytes each time, so why bother to move other threads to local cpu ? If the other threads have many bytes cached in their old cpu, this migration is expensive. WAKEUP_TAIL is a good idea, I guess that you want to wake up hot-thread its code and data are still in its old cpu's cache. But this will lead to unfair. I'd like the sleep queue to implement a policy like I did in libthr, it picks a thread at tail of queue in a fixed percentage, it does have some level of unfair but not fatal at all. > Thanks, > Jeff