Date: Fri, 13 Aug 2004 20:16:42 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: rwatson@FreeBSD.org Cc: freebsd-current@FreeBSD.org Subject: Re: nice handling in ULE (was: Re: SCHEDULE and high load situations) Message-ID: <200408140316.i7E3GgHG003765@gw.catspoiler.org> In-Reply-To: <200408131002.i7DA2afX001590@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 13 Aug, To: rwatson@freebsd.org wrote: > On 12 Aug, Don Lewis wrote: > >> I did some experimentation, and the problem I'm seeing appears to just >> be related to how nice values are handled by ULE. I'm running two >> copies of the following program, one at nice +15, and the other not >> niced: >> >> hairball:~ 102>cat sponge.c >> int >> main(int argc, char **argv) >> { >> while (1) >> ; >> } >> >> The niced process was started second, but it has accumulated more CPU >> time and is getting a larger percentage of the CPU time according to >> top. >> >> last pid: 662; load averages: 2.00, 1.95, 1.45 up 0+00:22:35 15:14:27 >> 31 processes: 3 running, 28 sleeping >> CPU states: 45.3% user, 53.1% nice, 1.2% system, 0.4% interrupt, 0.0% idle >> Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free >> Swap: 1024M Total, 1024M Free >> Seconds to delay: >> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND >> 599 dl 139 15 1180K 448K RUN 8:34 53.91% 53.91% sponge >> 598 dl 139 0 1180K 448K RUN 7:22 42.97% 42.97% sponge >> 587 dl 76 0 2288K 1580K RUN 0:03 0.00% 0.00% top >> 462 root 76 0 56656K 46200K select 0:02 0.00% 0.00% Xorg >> 519 gdm 76 0 11252K 8564K select 0:01 0.00% 0.00% gdmlogin >> 579 dl 76 0 6088K 2968K select 0:00 0.00% 0.00% sshd >> >> >> >> I thought it might have something to do with grouping by niceness, which >> would group the un-niced process with a bunch of other processes that >> wake up every now and then for a little bit if CPU time, so I tried the >> experiment again with nice +1 and nice +15. This gave a rather >> interesting result. Top reports the nice +15 process as getting a >> higher %CPU, but the nice +1 process has slowly accumulated a bit more >> total CPU time. The difference in total CPU time was initially seven >> seconds or less. >> >> last pid: 745; load averages: 2.00, 1.99, 1.84 up 0+00:43:30 15:35:22 >> 31 processes: 3 running, 28 sleeping >> CPU states: 0.0% user, 99.6% nice, 0.4% system, 0.0% interrupt, 0.0% idle >> Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free >> Swap: 1024M Total, 1024M Free >> Seconds to delay: >> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND >> 675 dl 139 15 1180K 448K RUN 9:48 52.34% 52.34% sponge >> 674 dl 139 1 1180K 448K RUN 10:03 44.53% 44.53% sponge >> 587 dl 76 0 2288K 1580K RUN 0:06 0.00% 0.00% top >> 462 root 76 0 56656K 46200K select 0:03 0.00% 0.00% Xorg >> 519 gdm 76 0 11252K 8564K select 0:02 0.00% 0.00% gdmlogin >> 579 dl 76 0 6088K 2968K select 0:00 0.00% 0.00% sshd > > > I compiled a kernel with the KTR stuff and ran this last experiment > again. It looks like the two niced processes get the appropriate slice > values assigned by ULE, and they both have the same priority. Where > things seem to be going wrong is that the two processes are being run in > a round robin fashion, alternating execution once every tick or two. The > less-nice process gets prempted multiple times by the more-nice process > before the less-nice process has exhausted its slice. I managed to figure out a bit more of what is going on. The following code in sched_choose() pulls the selected kse off the run queue. ke = kseq_choose(kseq); if (ke) { #ifdef SMP if (ke->ke_ksegrp->kg_pri_class == PRI_IDLE) if (kseq_idled(kseq) == 0) goto restart; #endif kseq_runq_rem(kseq, ke); ke->ke_state = KES_THREAD; if (ke->ke_ksegrp->kg_pri_class == PRI_TIMESHARE) { CTR4(KTR_ULE, "Run kse %p from %p (slice: %d, pri: %d)", ke, ke->ke_runq, ke->ke_slice, ke->ke_thread->td_priority); } return (ke); } At some later time, setrunqueue() gets called for this thread, which calls sched_add(), which calls sched_add_internal(), which executes the following code frament. class = PRI_BASE(kg->kg_pri_class); switch (class) { case PRI_ITHD: case PRI_REALTIME: ke->ke_runq = kseq->ksq_curr; ke->ke_slice = SCHED_SLICE_MAX; ke->ke_cpu = PCPU_GET(cpuid); break; case PRI_TIMESHARE: if (SCHED_CURR(kg, ke)) ke->ke_runq = kseq->ksq_curr; else ke->ke_runq = kseq->ksq_next; break; [snip] kseq_runq_add(kseq, ke); kseq_load_add(kseq, ke); Because the thread is a CPU hog, it gets put on the next run queue, even though it hasn't exhausted its current slice, which means that it has to wait for all the other CPU hogs to get a turn at the CPU before it can execute again. I don't know how to fix this problem. I think the desired behaviour would be for the kse to be restored to its previous location on the run queue.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200408140316.i7E3GgHG003765>