From owner-freebsd-current@FreeBSD.ORG  Sat Aug 14 05:23:34 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id EEDEF16A4CE; Sat, 14 Aug 2004 05:23:34 +0000 (GMT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 61D1B43D2D; Sat, 14 Aug 2004 05:23:34 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i7E5NO2x003922;
	Fri, 13 Aug 2004 22:23:28 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200408140523.i7E5NO2x003922@gw.catspoiler.org>
Date: Fri, 13 Aug 2004 22:23:24 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: rwatson@FreeBSD.org
In-Reply-To: <200408140316.i7E3GgHG003765@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: jroberson@chesapeake.net
cc: freebsd-current@FreeBSD.org
Subject: Re: nice handling in ULE (was: Re: SCHEDULE and high load
 situations)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Aug 2004 05:23:35 -0000

On 13 Aug, Don Lewis wrote:

> I managed to figure out a bit more of what is going on.
> 
> The following code in sched_choose() pulls the selected kse off the run
> queue.
> 
>         ke = kseq_choose(kseq);
>         if (ke) {
> #ifdef SMP
>                 if (ke->ke_ksegrp->kg_pri_class == PRI_IDLE)
>                         if (kseq_idled(kseq) == 0)
>                                 goto restart;
> #endif
>                 kseq_runq_rem(kseq, ke);
>                 ke->ke_state = KES_THREAD;
> 
>                 if (ke->ke_ksegrp->kg_pri_class == PRI_TIMESHARE) {
>                         CTR4(KTR_ULE, "Run kse %p from %p (slice: %d, pri: %d)",
>                             ke, ke->ke_runq, ke->ke_slice,
>                             ke->ke_thread->td_priority);
>                 }
>                 return (ke);
>         }
> 
> 
> At some later time, setrunqueue() gets called for this thread, which
> calls sched_add(), which calls sched_add_internal(), which executes the
> following code frament.
> 
>         class = PRI_BASE(kg->kg_pri_class);
>         switch (class) {
>         case PRI_ITHD:
>         case PRI_REALTIME:
>                 ke->ke_runq = kseq->ksq_curr;
>                 ke->ke_slice = SCHED_SLICE_MAX; 
>                 ke->ke_cpu = PCPU_GET(cpuid);
>                 break;
>         case PRI_TIMESHARE:
>                 if (SCHED_CURR(kg, ke)) 
>                         ke->ke_runq = kseq->ksq_curr;
>                 else
>                         ke->ke_runq = kseq->ksq_next;
>                 break;
> [snip]
>         kseq_runq_add(kseq, ke);
>         kseq_load_add(kseq, ke);
> 
> 
> Because the thread is a CPU hog, it gets put on the next run queue, even
> though it hasn't exhausted its current slice, which means that it has to
> wait for all the other CPU hogs to get a turn at the CPU before it can
> execute again.
> 
> I don't know how to fix this problem.  I think the desired behaviour
> would be for the kse to be restored to its previous location on the run
> queue.

The best idea that I can come up with is for sched_add(),
sched_add_internal(), kseq_runq_add(), and runq_add() to grow another
parameter that would tell them whether to prepend to the beginning of
the run queue or append to the end.  If setrunqueue() detects that
TD_IS_RUNNING(td) is true, it would pass the flag to sched_add() that
would cause the thread to be added to the beginning of the queue.

I don't know if this is appropriate in the PRI_ITHD and PRI_REALTIME
cases, or if we want to continue to to round-robin.

Comments?