Date: Fri, 28 Nov 2008 10:48:00 +0800 From: David Xu <davidxu@freebsd.org> To: Jeff Roberson <jroberson@jroberson.net> Cc: freebsd-bugs@freebsd.org, Unga <unga888@yahoo.com>, jhb@freebsd.org, jeff@freebsd.org, brde@optusnet.com.au, yanefbsd@gmail.com Subject: Re: kern/129164: Wrong priority value for normal processes Message-ID: <492F5BE0.2040200@freebsd.org> In-Reply-To: <20081127000235.D971@desktop> References: <484904.78100.qm@web57001.mail.re3.yahoo.com> <20081127000235.D971@desktop>
index | next in thread | previous in thread | raw e-mail
Jeff Roberson wrote:
> On Thu, 27 Nov 2008, Unga wrote:
>
>> --- On Tue, 11/25/08, Unga <unga888@yahoo.com> wrote:
>>
>>> The priority value for root and other normal processes is
>>> 65504 (rtp.prio) where zero (0) is expected.
>>>
>>> I checked the program flow from /usr/src/usr.bin/su/su.c to
>>> /usr/src/lib/libutil/login_class.c and it looks
>>> setusercontext() is setting the priority zero (0) right but
>>> the moment it come out from the setusercontext() call in
>>> su.c, the priority has already turn to 65504.
>>>
>>> Maximum priority value for normal priority processes can
>>> take is 20, not 65504. Normal priority processes are
>>> expected to run at priority zero (0) as it is specified in
>>> /etc/login.conf under login class "default".
>>>
>>
>> I have further checked the rtprio(2) system call for how it set and
>> read priorities.
>>
>> Setting Priority:
>> rtprio(RTP_SET, 0, &rtp)
>>
>> rtprio() => rtprio_thread() => rtp_to_pri()
>>
>> rtp_to_pri() calculates newpri as:
>> newpri = PRI_MIN_TIMESHARE + rtp->prio;
>>
>> PRI_MIN_TIMESHARE is for normal priority, its PRI_MIN_REALTIME for
>> realtime priority etc.
>>
>> Now rtp_to_pri() calls sched_class() to set the priority class. It
>> sets td->td_pri_class to the priority class given.
>>
>> Then rtp_to_pri() calls sched_user_prio() to set the priority. It sets
>> following fields to the priority calculated (newpri):
>> td->td_base_user_pri
>> td->td_user_pri
>>
>> Then rtp_to_pri() calls sched_prio(). It sets following field to the
>> priority calculated (newpri):
>> td->td_base_pri
>>
>> The sched_prio() calls sched_thread_priority() which sets
>> td->td_priority to the priority calculated (newpri).
>>
>> Of course not all td->td_* fields are set in one go. Some are set
>> conditionally. But the td->td_base_user_pri is always set.
>>
>>
>> Reading Priority:
>> rtprio(RTP_LOOKUP, 0, &rtp)
>>
>> rtprio() => rtprio_thread() => pri_to_rtp()
>>
>> At pri_to_rtp(), rtp->type and rtp->prio are set as follows:
>> rtp->type = td->td_pri_class;
>> rtp->prio = td->td_base_user_pri - PRI_MIN_TIMESHARE;
>>
>> That is, rtprio(2) system call sets the td->td_base_user_pri when
>> request to set priority, and when request to read the priority, it
>> reads the td->td_base_user_pri.
>>
>> In another word, for rtprio(2) to function properly the
>> td->td_base_user_pri should not be changed.
>>
>> As the rtp->prio is unsigned short, for rtp->prio to become a huge
>> number (65504), td->td_base_user_pri should be less than
>> PRI_MIN_TIMESHARE.
>>
>>
>> This shows the actual problem is in the scheduler. In this case,
>> sched_ule.
>>
>> I presume the sched_ule should not touch the td->td_base_user_pri.
>> Instead, probably, it should use td->td_priority for its internal
>> purposes.
>>
>> Appreciate if Jeffrey Roberson <jeff@freebsd.org> could shed more
>> light on this issue.
>
> The base_pri vs td_priority is really jhb's domain. I added him to the cc.
>
> Thanks,
> Jeff
>
This might be caused by following code in sched_ule.c:
static void
sched_priority(struct thread *td)
{
int score;
int pri;
if (td->td_pri_class != PRI_TIMESHARE)
return;
/*
* If the score is interactive we place the thread in the realtime
* queue with a priority that is less than kernel and interrupt
* priorities. These threads are not subject to nice restrictions.
*
* Scores greater than this are placed on the normal timeshare
queue
* where the priority is partially decided by the most recent cpu
* utilization and the rest is decided by nice value.
*
* The nice value of the process has a linear effect on the
calculated
* score. Negative nice values make it easier for a thread to be
* considered interactive.
*/
score = imax(0, sched_interact_score(td) - td->td_proc->p_nice);
if (score < sched_interact) {
pri = PRI_MIN_REALTIME;
pri += ((PRI_MAX_REALTIME - PRI_MIN_REALTIME) /
sched_interact)
* score;
KASSERT(pri >= PRI_MIN_REALTIME && pri <= PRI_MAX_REALTIME,
("sched_priority: invalid interactive priority %d
score %d",
pri, score));
} else {
it uses PRI_MIN_REALTIME, then it calls sched_user_prio(td, pri) which
sets td_base_user_pri and td_user_pri, and causes td_user_pri and
td_base_user_pri to be out of range. Should PRI_MIN_REALTIME and
PRI_MAX_REALTIME be PRI_MIN_TIMESHARE and PRI_MAX_TIMESHARE ?
>>
>> Since I'm not conversant with the sched_ule, I may not be able to
>> develop a fix for sched_ule. Appreciate either Jeffrey or somebody
>> else could look into a fix for sched_ule. I can certainly help in
>> apply a patch and test.
>>
>> Best regards
>> Unga
>>
>>
>>
>>
> _______________________________________________
> freebsd-bugs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
> To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
>
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?492F5BE0.2040200>
