From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov  7 06:17:45 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 0F6606AD
 for <freebsd-hackers@freebsd.org>; Wed,  7 Nov 2012 06:17:45 +0000 (UTC)
 (envelope-from jroberson@jroberson.net)
Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com
 [209.85.220.54])
 by mx1.freebsd.org (Postfix) with ESMTP id C771B8FC12
 for <freebsd-hackers@freebsd.org>; Wed,  7 Nov 2012 06:17:44 +0000 (UTC)
Received: by mail-pa0-f54.google.com with SMTP id bi1so978861pad.13
 for <freebsd-hackers@freebsd.org>; Tue, 06 Nov 2012 22:17:43 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id
 :references:user-agent:mime-version:content-type:x-gm-message-state;
 bh=VN1hk01Xq8BVYQs6E74eurX4JSgKBFOnRMU+O7uZtsM=;
 b=OQMbVclRcnPsJ8ajpD5cxhj3COUftC5WRCNjizp6dRiIIipjCfiOtcsza8RUsydxCN
 cem/wMoe+miwkGpBOGWw1blG+ebeqYusiBBgks4XF7KV5QaQAe8zPnylkc8J2vO7gCEs
 6ou1p5TSY2T/neI4XbxywjMJ16eSilSfAvxDLAhKHfwoHsBQqwszetOAqJ6kCTCz3sfn
 yTI4W9AKbKBMM3z8TODr4jB9RhgS9coV2+iEwcBu+CVP6iozdcf59EVehbd8EhcplCnM
 cXmO4ND/fACmtrMztT5Whudj14VH847tAJQUx2pYVKT8gawGCz1GbowktJMV+dXIhhV2
 Vt+Q==
Received: by 10.68.228.36 with SMTP id sf4mr10824677pbc.20.1352269063401;
 Tue, 06 Nov 2012 22:17:43 -0800 (PST)
Received: from rrcs-66-91-135-210.west.biz.rr.com
 (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210])
 by mx.google.com with ESMTPS id ty4sm13595301pbc.57.2012.11.06.22.17.41
 (version=SSLv3 cipher=OTHER); Tue, 06 Nov 2012 22:17:42 -0800 (PST)
Date: Tue, 6 Nov 2012 20:17:20 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: David Xu <davidxu@freebsd.org>
Subject: Re: ule+smp: small optimization for turnstile priority lending
In-Reply-To: <5099C5C9.4040703@freebsd.org>
Message-ID: <alpine.BSF.2.00.1211062016380.4081@desktop>
References: <50587F8D.9060102@FreeBSD.org> <505AD2A5.6060008@freebsd.org>
 <CAJ-FndBGEgkrKJ9bNdq0QrdyYZb=LXUsAG3wz5Lp-HLUBd5d9w@mail.gmail.com>
 <5099C5C9.4040703@freebsd.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Gm-Message-State: ALoCoQnwzHgWWnmALnhQZHkeRcph8EX6ZXZfoANS+nZqcdKjzGs8pTTfI0dZKEWF2j0uIsThjy3M
X-Mailman-Approved-At: Wed, 07 Nov 2012 12:40:11 +0000
Cc: attilio@freebsd.org, freebsd-hackers <freebsd-hackers@freebsd.org>,
 Jeff Roberson <jeff@freebsd.org>, Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Nov 2012 06:17:45 -0000

On Wed, 7 Nov 2012, David Xu wrote:

> On 2012/11/06 19:03, Attilio Rao wrote:
>> On 9/20/12, David Xu <davidxu@freebsd.org> wrote:
>>> On 2012/09/18 22:05, Andriy Gapon wrote:
>>>> 
>>>> Here is a snippet that demonstrates the issue on a supposedly fully
>>>> loaded
>>>> 2-processor system:
>>>> 
>>>> 136794   0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid
>>>> 102818",
>>>> state:"running", attributes: prio:122
>>>> 
>>>> 136793   0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid
>>>> 111916",
>>>> state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)"
>>>> 
>>>> 136792   1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid
>>>> 100004",
>>>> state:"running", attributes: prio:255
>>>> 
>>>> 136791   1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load",
>>>> counter:0,
>>>> attributes: none
>>>> 
>>>> 136790   1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid
>>>> 113473",
>>>> state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx"
>>>> 
>>>> 136789   1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load",
>>>> counter:2,
>>>> attributes: none
>>>> 
>>>> 136788   1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid
>>>> 113473",
>>>> point:"wokeup", attributes: linkedto:"Xorg tid 102818"
>>>> 
>>>> 136787   1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid
>>>> 102818",
>>>> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473"
>>>> 
>>>> 136786   1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load",
>>>> counter:1,
>>>> attributes: none
>>>> 
>>>> 136785   1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid
>>>> 102818",
>>>> state:"runq rem", attributes: prio:176
>>>> 
>>>> 136784   1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid
>>>> 102818",
>>>> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid
>>>> 113473"
>>>> 
>>>> 136783   1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid
>>>> 113473",
>>>> state:"running", attributes: prio:104
>>>> 
>>>> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
>>>> preempted
>>>> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with
>>>> zero load.
>>>> 
>>>> Here is a proposed solution:
>>>>
>>>>       turnstile_wait: optimize priority lending to a thread on a runqueue
>>>>
>>>>       As the current thread is definitely going into mi_switch, it now
>>>> removes
>>>>       its load before doing priority propagation which can potentially
>>>> result
>>>>       in sched_add.  In the SMP && ULE case the latter searches for the
>>>>       least loaded CPU to place a boosted thread, which is supposedly
>>>> about
>>>>       to run.
>>>> 
>>>> diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
>>>> index 8e466cd..3299cae 100644
>>>> --- a/sys/kern/sched_ule.c
>>>> +++ b/sys/kern/sched_ule.c
>>>> @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread
>>>> *newtd, int
>>>> flags)
>>>>    		/* This thread must be going to sleep. */
>>>>    		TDQ_LOCK(tdq);
>>>>    		mtx = thread_lock_block(td);
>>>> -		tdq_load_rem(tdq, td);
>>>> +#if defined(SMP)
>>>> +		if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE)
>>>> +#endif
>>>> +			tdq_load_rem(tdq, td);
>>>>    	}
>>>>    	/*
>>>>    	 * We enter here with the thread blocked and assigned to the
>>>> @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td)
>>>>    		tdq_setlowpri(tdq, NULL);
>>>>    }
>>>> 
>>>> +void
>>>> +sched_load_rem(struct thread *td)
>>>> +{
>>>> +	struct tdq *tdq;
>>>> +
>>>> +	KASSERT(td == curthread,
>>>> +	    ("sched_rem_load: only curthread is supported"));
>>>> +	KASSERT(td->td_oncpu == td->td_sched->ts_cpu,
>>>> +	    ("thread running on cpu different from ts_cpu"));
>>>> +	tdq = TDQ_CPU(td->td_sched->ts_cpu);
>>>> +	TDQ_LOCK_ASSERT(tdq, MA_OWNED);
>>>> +	MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
>>>> +	tdq_load_rem(tdq, td);
>>>> +}
>>>> +
>>>>    /*
>>>>     * Fetch cpu utilization information.  Updates on demand.
>>>>     */
>>>> diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c
>>>> index 31d16fe..d1d68e9 100644
>>>> --- a/sys/kern/subr_turnstile.c
>>>> +++ b/sys/kern/subr_turnstile.c
>>>> @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread
>>>> *owner,
>>>> int queue)
>>>>    		LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash);
>>>>    	}
>>>>    	thread_lock(td);
>>>> +#if defined(SCHED_ULE) && defined(SMP)
>>>> +	/*
>>>> +	 * Remove load earlier so that it does not affect cpu selection
>>>> +	 * for a thread waken up due to priority lending, if any.
>>>> +	 */
>>>> +	sched_load_rem(td);
>>>> +#endif
>>>>    	thread_lock_set(td, &ts->ts_lock);
>>>>    	td->td_turnstile = NULL;
>>>> 
>>>> diff --git a/sys/sys/sched.h b/sys/sys/sched.h
>>>> index 4b8387c..b1ead1b 100644
>>>> --- a/sys/sys/sched.h
>>>> +++ b/sys/sys/sched.h
>>>> @@ -110,6 +110,9 @@ void	sched_preempt(struct thread *td);
>>>>    void	sched_add(struct thread *td, int flags);
>>>>    void	sched_clock(struct thread *td);
>>>>    void	sched_rem(struct thread *td);
>>>> +#if defined(SCHED_ULE) && defined(SMP)
>>>> +void	sched_load_rem(struct thread *td);
>>>> +#endif
>>>>    void	sched_tick(int cnt);
>>>>    void	sched_relinquish(struct thread *td);
>>>>    struct thread *sched_choose(void);
>>>> 
>>> 
>>> I found another scenario in taskqueue, in the function
>>> taskqueue_terminate, current thread tries to wake
>>> another thread up and sleep immediately, the tq_mutex sometimes
>>> is a spinlock. So if you remove one thread load from current cpu
>>> before wakeup, the resumed thread may be put on same cpu,
>>> so it will optimize the cpu scheduling too.
>> 
>> I think that in order to fit with sched_add() modifies I have in mind
>> (see other patches within this thread) wakeup() should grow a new
>> argument, or maybe we can use wakeup_flags() new KPI.
>> If the latter is the case, I would also propose to let wakeup_one() to
>> be absorbed into wakeup_flags() with its own flag.
>> 
>
> Yes, I like the idea.

>From ~2007

http://people.freebsd.org/~jeff/wakeupflags.diff

This has some different optimizations that can be done when you have a 
hinted wakeup.

Thanks,
Jeff

>
>> Attilio
>> 
>> 
>