From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov  6 11:03:50 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 26F3C462;
 Tue,  6 Nov 2012 11:03:50 +0000 (UTC)
 (envelope-from asmrookie@gmail.com)
Received: from mail-la0-f54.google.com (mail-la0-f54.google.com
 [209.85.215.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 0EC008FC15;
 Tue,  6 Nov 2012 11:03:48 +0000 (UTC)
Received: by mail-la0-f54.google.com with SMTP id e12so290538lag.13
 for <multiple recipients>; Tue, 06 Nov 2012 03:03:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=+LBGboGTOeSATcHxlVekjrZH2kqM3UYoP6xRZgBMSs8=;
 b=YLSBhA6a3zvosLpWQ7Xav2Jy+KdUnyQ5MfIjAB8nmzhX0X87U7cr62uWs81z9WOwF8
 Kpv2a9YzUOoQmkWk/fOaygH8mrOGMgODudvqunbZwYRnM+PyPmbi7qZA3MlJICrzXppj
 mbZ8PUVsw0y9Eg3pCj3i27NzaiSjgoROxJuUFm8XC62Lya301YzXNGW6nJ2s+vgnOP42
 Uq4s2CzRBa+H6Z3NZwL12Fjq+1VgRtR7S5nyo8WRhq3rumH8jn5U+a/2uHJVprxFQvkS
 PVBUGDLG1/Klui7bCjckaQH1yGs23wpqJOxeosAbcMUPZaaDHELPhZ75aC3T6mzXuQ+t
 LkXw==
MIME-Version: 1.0
Received: by 10.152.110.234 with SMTP id id10mr695449lab.15.1352199827535;
 Tue, 06 Nov 2012 03:03:47 -0800 (PST)
Sender: asmrookie@gmail.com
Received: by 10.112.30.37 with HTTP; Tue, 6 Nov 2012 03:03:47 -0800 (PST)
In-Reply-To: <505AD2A5.6060008@freebsd.org>
References: <50587F8D.9060102@FreeBSD.org>
	<505AD2A5.6060008@freebsd.org>
Date: Tue, 6 Nov 2012 11:03:47 +0000
X-Google-Sender-Auth: Jj3FXjTSU3cpZsvwXWqRBfoTfmI
Message-ID: <CAJ-FndBGEgkrKJ9bNdq0QrdyYZb=LXUsAG3wz5Lp-HLUBd5d9w@mail.gmail.com>
Subject: Re: ule+smp: small optimization for turnstile priority lending
From: Attilio Rao <attilio@freebsd.org>
To: David Xu <davidxu@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-hackers <freebsd-hackers@freebsd.org>,
 Jeff Roberson <jeff@freebsd.org>, Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Nov 2012 11:03:50 -0000

On 9/20/12, David Xu <davidxu@freebsd.org> wrote:
> On 2012/09/18 22:05, Andriy Gapon wrote:
>>
>> Here is a snippet that demonstrates the issue on a supposedly fully
>> loaded
>> 2-processor system:
>>
>> 136794   0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid
>> 102818",
>> state:"running", attributes: prio:122
>>
>> 136793   0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid
>> 111916",
>> state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)"
>>
>> 136792   1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid
>> 100004",
>> state:"running", attributes: prio:255
>>
>> 136791   1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load",
>> counter:0,
>> attributes: none
>>
>> 136790   1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid
>> 113473",
>> state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx"
>>
>> 136789   1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load",
>> counter:2,
>> attributes: none
>>
>> 136788   1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid
>> 113473",
>> point:"wokeup", attributes: linkedto:"Xorg tid 102818"
>>
>> 136787   1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid
>> 102818",
>> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473"
>>
>> 136786   1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load",
>> counter:1,
>> attributes: none
>>
>> 136785   1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid
>> 102818",
>> state:"runq rem", attributes: prio:176
>>
>> 136784   1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid
>> 102818",
>> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid
>> 113473"
>>
>> 136783   1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid
>> 113473",
>> state:"running", attributes: prio:104
>>
>> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
>> preempted
>> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with
>> zero load.
>>
>> Here is a proposed solution:
>>
>>      turnstile_wait: optimize priority lending to a thread on a runqueue
>>
>>      As the current thread is definitely going into mi_switch, it now
>> removes
>>      its load before doing priority propagation which can potentially
>> result
>>      in sched_add.  In the SMP && ULE case the latter searches for the
>>      least loaded CPU to place a boosted thread, which is supposedly
>> about
>>      to run.
>>
>> diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
>> index 8e466cd..3299cae 100644
>> --- a/sys/kern/sched_ule.c
>> +++ b/sys/kern/sched_ule.c
>> @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread
>> *newtd, int
>> flags)
>>   		/* This thread must be going to sleep. */
>>   		TDQ_LOCK(tdq);
>>   		mtx = thread_lock_block(td);
>> -		tdq_load_rem(tdq, td);
>> +#if defined(SMP)
>> +		if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE)
>> +#endif
>> +			tdq_load_rem(tdq, td);
>>   	}
>>   	/*
>>   	 * We enter here with the thread blocked and assigned to the
>> @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td)
>>   		tdq_setlowpri(tdq, NULL);
>>   }
>>
>> +void
>> +sched_load_rem(struct thread *td)
>> +{
>> +	struct tdq *tdq;
>> +
>> +	KASSERT(td == curthread,
>> +	    ("sched_rem_load: only curthread is supported"));
>> +	KASSERT(td->td_oncpu == td->td_sched->ts_cpu,
>> +	    ("thread running on cpu different from ts_cpu"));
>> +	tdq = TDQ_CPU(td->td_sched->ts_cpu);
>> +	TDQ_LOCK_ASSERT(tdq, MA_OWNED);
>> +	MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
>> +	tdq_load_rem(tdq, td);
>> +}
>> +
>>   /*
>>    * Fetch cpu utilization information.  Updates on demand.
>>    */
>> diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c
>> index 31d16fe..d1d68e9 100644
>> --- a/sys/kern/subr_turnstile.c
>> +++ b/sys/kern/subr_turnstile.c
>> @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread
>> *owner,
>> int queue)
>>   		LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash);
>>   	}
>>   	thread_lock(td);
>> +#if defined(SCHED_ULE) && defined(SMP)
>> +	/*
>> +	 * Remove load earlier so that it does not affect cpu selection
>> +	 * for a thread waken up due to priority lending, if any.
>> +	 */
>> +	sched_load_rem(td);
>> +#endif
>>   	thread_lock_set(td, &ts->ts_lock);
>>   	td->td_turnstile = NULL;
>>
>> diff --git a/sys/sys/sched.h b/sys/sys/sched.h
>> index 4b8387c..b1ead1b 100644
>> --- a/sys/sys/sched.h
>> +++ b/sys/sys/sched.h
>> @@ -110,6 +110,9 @@ void	sched_preempt(struct thread *td);
>>   void	sched_add(struct thread *td, int flags);
>>   void	sched_clock(struct thread *td);
>>   void	sched_rem(struct thread *td);
>> +#if defined(SCHED_ULE) && defined(SMP)
>> +void	sched_load_rem(struct thread *td);
>> +#endif
>>   void	sched_tick(int cnt);
>>   void	sched_relinquish(struct thread *td);
>>   struct thread *sched_choose(void);
>>
>
> I found another scenario in taskqueue, in the function
> taskqueue_terminate, current thread tries to wake
> another thread up and sleep immediately, the tq_mutex sometimes
> is a spinlock. So if you remove one thread load from current cpu
> before wakeup, the resumed thread may be put on same cpu,
> so it will optimize the cpu scheduling too.

I think that in order to fit with sched_add() modifies I have in mind
(see other patches within this thread) wakeup() should grow a new
argument, or maybe we can use wakeup_flags() new KPI.
If the latter is the case, I would also propose to let wakeup_one() to
be absorbed into wakeup_flags() with its own flag.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein