From owner-freebsd-hackers  Thu Apr 27 18:16:50 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from localhost (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id BFCED37BB73; Thu, 27 Apr 2000 18:16:43 -0700 (PDT)
	(envelope-from green@FreeBSD.org)
Date: Thu, 27 Apr 2000 21:16:40 -0400 (EDT)
From: Brian Fundakowski Feldman <green@FreeBSD.org>
X-Sender: green@green.dyndns.org
To: Luoqi Chen <luoqi@watermarkgroup.com>
Cc: hackers@FreeBSD.ORG
Subject: Re:  lock-ups due to the scheduler 
In-Reply-To: <200004270547.e3R5lfi24004@lor.watermarkgroup.com>
Message-ID: <Pine.BSF.4.21.0004272022330.616-100000@green.dyndns.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 27 Apr 2000, Luoqi Chen wrote:

> This is quite interesting. I'm no scheduler expert, but my understanding
> is priority < PUSER won't degrade and is only set in kernel mode after
> waking up from a sleep. In user mode, processes should always have priority
> p_usrpri >= PUSER, it is obviously not true for a negative nice value:

That was my take on it.  There were multiple tests for <= PUSER which
were really tests for whether or not the process was in SRUN.  However,
changing these tests to SRUN tests didn't prevent lockups, so the
problem seems to be deeper than that.  And they definitely keep p_priority
< PUSER processes from updating their p_priority to their newly calculated
p_usrpri, a very large bug.

The deeper problem seems to be that for whatever the process does, it
never accrues enough estcpu to classify it as hoggy, as a process I
start with a niceness of -20 cycles through priorities 10 (in the
very beginning) and 27 at the very highest.  This _shouldn't_ be too
much of a problem, but it never gets to 50 and thus never gets
rescheduled properly... and this seems to be most of what's causing the
lockups.

> >  		newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT +
> >  		    NICE_WEIGHT * p->p_nice;
> We should probably offset p->p_nice by PRIO_MIN,
> >  		newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT +
> >  		    NICE_WEIGHT * (p->p_nice - PRIO_MIN);
> 
> To fully utilize the 20 out of 32 run queues for user priorities, we might
> want to change NICE_WEIGHT from 2 to 1, and upper limit of p_estcpu to
> #define ESTCPULIM(e) \
>     min((e),
> 	INVERSE_ESTCPU_WEIGHT * (NICE_WEIGHT * (PRIO_MAX - PRIO_MIN) - PPQ) + \
> 	INVERSE_ESTCPU_WEIGHT - 1)
> so that a cpu hog at nice 0 would have about the same priority as a low
> cpu usage nice +20 process.

Yes, this seems right.  It seems that the niceness making the priority dip
below 50 is a bad idea.  I think that if we make that modification (which
is another thing I tried) of niceness values subtracting PRIO_MIN to
prevent any values less than PUSER, this would fix the bugs we have.
I missed, when I did it, changing ESTCPULIM, so that probably explains
why things didn't (I believe) lock up, but (I believe) seemed veerrry
bad...  Also, decreasing NICE_WEIGHT would be a good idea, so I'll try
all of this out, and report later.

> -lq

--
 Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
 green@FreeBSD.org                    `------------------------------'


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message