Date: Tue, 13 Dec 2011 16:01:56 -0800 From: mdf@FreeBSD.org To: Ivan Klymenko <fidaj@ukr.net> Cc: Doug Barton <dougb@freebsd.org>, freebsd-stable@freebsd.org, Jilles Tjoelker <jilles@stack.nl>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd-current@freebsd.org>, freebsd-performance@freebsd.org Subject: Re: SCHED_ULE should not be the default Message-ID: <CAMBSHm89SkzGVgk9kNwBQoR62pXKjhJ%2BqXJK0qwC20r9p%2Bu-bw@mail.gmail.com> In-Reply-To: <4ee7e2d3.0a3c640a.4617.4a33SMTPIN_ADDED@mx.google.com> References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <4EE69C5A.3090005@FreeBSD.org> <20111213104048.40f3e3de@nonamehost.> <20111213230441.GB42285@stack.nl> <4ee7e2d3.0a3c640a.4617.4a33SMTPIN_ADDED@mx.google.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko <fidaj@ukr.net> wrote: > =D0=92 Wed, 14 Dec 2011 00:04:42 +0100 > Jilles Tjoelker <jilles@stack.nl> =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > >> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: >> > If the algorithm ULE does not contain problems - it means the >> > problem has Core2Duo, or in a piece of code that uses the ULE >> > scheduler. I already wrote in a mailing list that specifically in >> > my case (Core2Duo) partially helps the following patch: >> > --- sched_ule.c.orig =C2=A0 =C2=A0 =C2=A0 =C2=A02011-11-24 18:11:48.00= 0000000 +0200 >> > +++ sched_ule.c =C2=A0 =C2=A0 2011-12-10 22:47:08.000000000 +0200 >> > @@ -794,7 +794,8 @@ >> > =C2=A0 =C2=A0 =C2=A0* 1.5 * balance_interval. >> > =C2=A0 =C2=A0 =C2=A0*/ >> > =C2=A0 =C2=A0 balance_ticks =3D max(balance_interval / 2, 1); >> > - =C2=A0 balance_ticks +=3D random() % balance_interval; >> > +// balance_ticks +=3D random() % balance_interval; >> > + =C2=A0 balance_ticks +=3D ((int)random()) % balance_interval; >> > =C2=A0 =C2=A0 if (smp_started =3D=3D 0 || rebalance =3D=3D 0) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >> > =C2=A0 =C2=A0 tdq =3D TDQ_SELF(); >> >> This avoids a 64-bit division on 64-bit platforms but seems to have no >> effect otherwise. Because this function is not called very often, the >> change seems unlikely to help. > > Yes, this section does not apply to this problem :) > Just I posted the latest patch which i using now... > >> >> > @@ -2118,13 +2119,21 @@ >> > =C2=A0 =C2=A0 struct td_sched *ts; >> > >> > =C2=A0 =C2=A0 THREAD_LOCK_ASSERT(td, MA_OWNED); >> > + =C2=A0 if (td->td_pri_class & PRI_FIFO_BIT) >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >> > + =C2=A0 ts =3D td->td_sched; >> > + =C2=A0 /* >> > + =C2=A0 =C2=A0* We used up one time slice. >> > + =C2=A0 =C2=A0*/ >> > + =C2=A0 if (--ts->ts_slice > 0) >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >> >> This skips most of the periodic functionality (long term load >> balancer, saving switch count (?), insert index (?), interactivity >> score update for long running thread) if the thread is not going to >> be rescheduled right now. >> >> It looks wrong but it is a data point if it helps your workload. > > Yes, I did it for as long as possible to delay the execution of the code = in section: > ... > #ifdef SMP > =C2=A0 =C2=A0 =C2=A0 =C2=A0/* > =C2=A0 =C2=A0 =C2=A0 =C2=A0 * We run the long term load balancer infreque= ntly on the first cpu. > =C2=A0 =C2=A0 =C2=A0 =C2=A0 */ > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (balance_tdq =3D=3D tdq) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (balance_ticks = && --balance_ticks =3D=3D 0) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0sched_balance(); > =C2=A0 =C2=A0 =C2=A0 =C2=A0} > #endif > ... > >> >> > =C2=A0 =C2=A0 tdq =3D TDQ_SELF(); >> > =C2=A0#ifdef SMP >> > =C2=A0 =C2=A0 /* >> > =C2=A0 =C2=A0 =C2=A0* We run the long term load balancer infrequently = on the >> > first cpu. */ >> > - =C2=A0 if (balance_tdq =3D=3D tdq) { >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (balance_ticks && --balance_ti= cks =3D=3D 0) >> > + =C2=A0 if (balance_ticks && --balance_ticks =3D=3D 0) { >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (balance_tdq =3D=3D tdq) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = sched_balance(); >> > =C2=A0 =C2=A0 } >> > =C2=A0#endif >> >> The main effect of this appears to be to disable the long term load >> balancer completely after some time. At some point, a CPU other than >> the first CPU (which uses balance_tdq) will set balance_ticks =3D 0, and >> sched_balance() will never be called again. >> > > That is, for the same reason as above in the text... > >> It also introduces a hypothetical race condition because the access to >> balance_ticks is no longer restricted to one CPU under a spinlock. >> >> If the long term load balancer may be causing trouble, try setting >> kern.sched.balance_interval to a higher value with unpatched code. > > I checked it in the first place - but it did not help fix the situation..= . > > The impression of malfunction rebalancing... > It seems that the thread is passed on to the same core that is loaded and= so... > Perhaps this is a consequence of an incorrect definition of the topology = CPU? > >> >> > @@ -2144,9 +2153,6 @@ >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if >> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) >> > tdq->tdq_ridx =3D tdq->tdq_idx; } >> > - =C2=A0 ts =3D td->td_sched; >> > - =C2=A0 if (td->td_pri_class & PRI_FIFO_BIT) >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >> > =C2=A0 =C2=A0 if (PRI_BASE(td->td_pri_class) =3D=3D PRI_TIMESHARE) { >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* We used a tick; char= ge it to the thread so >> > @@ -2157,11 +2163,6 @@ >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sched_priority(td); >> > =C2=A0 =C2=A0 } >> > =C2=A0 =C2=A0 /* >> > - =C2=A0 =C2=A0* We used up one time slice. >> > - =C2=A0 =C2=A0*/ >> > - =C2=A0 if (--ts->ts_slice > 0) >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return; >> > - =C2=A0 /* >> > =C2=A0 =C2=A0 =C2=A0* We're out of time, force a requeue at userret(). >> > =C2=A0 =C2=A0 =C2=A0*/ >> > =C2=A0 =C2=A0 ts->ts_slice =3D sched_slice; >> >> > and refusal to use options FULL_PREEMPTION >> > But no one has unsubscribed to my letter, my patch helps or not in >> > the case of Core2Duo... >> > There is a suspicion that the problems stem from the sections of >> > code associated with the SMP... >> > Maybe I'm in something wrong, but I want to help in solving this >> > problem ... Has anyone experiencing problems tried to set sysctl kern.sched.steal_thres= h=3D1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the sysctl after a box is up. Thanks, matthew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMBSHm89SkzGVgk9kNwBQoR62pXKjhJ%2BqXJK0qwC20r9p%2Bu-bw>