Date: Fri, 6 May 2022 18:57:33 -0600 From: Warner Losh <imp@bsdimp.com> To: Alan Somers <asomers@freebsd.org> Cc: Kubilay Kocak <koobs@freebsd.org>, src-committers <src-committers@freebsd.org>, "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>, dev-commits-src-main@freebsd.org Subject: Re: git: 1d2421ad8b6d - main - Correctly measure system load averages > 1024 Message-ID: <CANCZdfpb0EvgCv=aKZB8P_zUOBHWNnzeW5Q2OSH8w5mGiVa_zg@mail.gmail.com> In-Reply-To: <CAOtMX2gr8HY6mK%2BU1QPV21zpthz5WFSgkuv_c3-scgm80iY8CA@mail.gmail.com> References: <202205070004.24704iIx031164@gitrepo.freebsd.org> <771111e0-5c1b-8eb3-751d-c5f2b8bc36eb@FreeBSD.org> <CAOtMX2gr8HY6mK%2BU1QPV21zpthz5WFSgkuv_c3-scgm80iY8CA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000007b638f05de61765c Content-Type: text/plain; charset="UTF-8" I'd expect it to die of lock contention overload well before a load average of 1,000,000... So I think we're safe Warner On Fri, May 6, 2022 at 6:50 PM Alan Somers <asomers@freebsd.org> wrote: > Yes, it can be MFCd. The only risk I'm aware of is that the 4.4 bsd > scheduler might start acting weird - once the load average gets close to > one million. > > On Fri, May 6, 2022, 6:06 PM Kubilay Kocak <koobs@freebsd.org> wrote: > >> On 7/05/2022 10:04 am, Alan Somers wrote: >> > The branch main has been updated by asomers: >> > >> > URL: >> https://cgit.FreeBSD.org/src/commit/?id=1d2421ad8b6d508ef155752bdfc5948f7373bac3 >> > >> > commit 1d2421ad8b6d508ef155752bdfc5948f7373bac3 >> > Author: Alan Somers <asomers@FreeBSD.org> >> > AuthorDate: 2022-05-05 21:35:23 +0000 >> > Commit: Alan Somers <asomers@FreeBSD.org> >> > CommitDate: 2022-05-06 23:25:43 +0000 >> > >> > Correctly measure system load averages > 1024 >> > >> > The old fixed-point arithmetic used for calculating load averages >> had an >> > overflow at 1024. So on systems with extremely high load, the >> observed >> > load average would actually fall back to 0 and shoot up again, >> creating >> > a kind of sawtooth graph. >> > >> > Fix this by using 64-bit math internally, while still reporting >> the load >> > average to userspace as a 32-bit number. >> > >> > Sponsored by: Axcient >> > Reviewed by: imp >> > Differential Revision: https://reviews.freebsd.org/D35134 >> >> Can MFC? >> >> > --- >> > sys/kern/kern_synch.c | 9 +++++---- >> > sys/kern/tty_info.c | 2 +- >> > sys/sys/param.h | 8 ++++---- >> > 3 files changed, 10 insertions(+), 9 deletions(-) >> > >> > diff --git a/sys/kern/kern_synch.c b/sys/kern/kern_synch.c >> > index e78878987b57..381d6315044c 100644 >> > --- a/sys/kern/kern_synch.c >> > +++ b/sys/kern/kern_synch.c >> > @@ -87,7 +87,7 @@ struct loadavg averunnable = >> > * Constants for averages over 1, 5, and 15 minutes >> > * when sampling at 5 second intervals. >> > */ >> > -static fixpt_t cexp[3] = { >> > +static uint64_t cexp[3] = { >> > 0.9200444146293232 * FSCALE, /* exp(-1/12) */ >> > 0.9834714538216174 * FSCALE, /* exp(-1/60) */ >> > 0.9944598480048967 * FSCALE, /* exp(-1/180) */ >> > @@ -611,14 +611,15 @@ setrunnable(struct thread *td, int srqflags) >> > static void >> > loadav(void *arg) >> > { >> > - int i, nrun; >> > + int i; >> > + uint64_t nrun; >> > struct loadavg *avg; >> > >> > - nrun = sched_load(); >> > + nrun = (uint64_t)sched_load(); >> > avg = &averunnable; >> > >> > for (i = 0; i < 3; i++) >> > - avg->ldavg[i] = (cexp[i] * avg->ldavg[i] + >> > + avg->ldavg[i] = (cexp[i] * (uint64_t)avg->ldavg[i] + >> > nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT; >> > >> > /* >> > diff --git a/sys/kern/tty_info.c b/sys/kern/tty_info.c >> > index 60675557e4ed..237aa47a18da 100644 >> > --- a/sys/kern/tty_info.c >> > +++ b/sys/kern/tty_info.c >> > @@ -302,7 +302,7 @@ tty_info(struct tty *tp) >> > sbuf_set_drain(&sb, sbuf_tty_drain, tp); >> > >> > /* Print load average. */ >> > - load = (averunnable.ldavg[0] * 100 + FSCALE / 2) >> FSHIFT; >> > + load = ((int64_t)averunnable.ldavg[0] * 100 + FSCALE / 2) >> >> FSHIFT; >> > sbuf_printf(&sb, "%sload: %d.%02d ", tp->t_column == 0 ? "" : >> "\n", >> > load / 100, load % 100); >> > >> > diff --git a/sys/sys/param.h b/sys/sys/param.h >> > index 2d463b9ac7a2..b0b53f1a7776 100644 >> > --- a/sys/sys/param.h >> > +++ b/sys/sys/param.h >> > @@ -361,12 +361,12 @@ __END_DECLS >> > * Scale factor for scaled integers used to count %cpu time and load >> avgs. >> > * >> > * The number of CPU `tick's that map to a unique `%age' can be >> expressed >> > - * by the formula (1 / (2 ^ (FSHIFT - 11))). The maximum load average >> that >> > - * can be calculated (assuming 32 bits) can be closely approximated >> using >> > - * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15). >> > + * by the formula (1 / (2 ^ (FSHIFT - 11))). Since the intermediate >> > + * calculation is done with 64-bit precision, the maximum load average >> that can >> > + * be calculated is approximately 2^32 / FSCALE. >> > * >> > * For the scheduler to maintain a 1:1 mapping of CPU `tick' to >> `%age', >> > - * FSHIFT must be at least 11; this gives us a maximum load avg of >> ~1024. >> > + * FSHIFT must be at least 11. This gives a maximum load avg of 2 >> million. >> > */ >> > #define FSHIFT 11 /* bits to right of fixed binary >> point */ >> > #define FSCALE (1<<FSHIFT) >> > >> >> >> --0000000000007b638f05de61765c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">I'd expect it to die of lock contention overload well = before a load average of 1,000,000... So I think we're safe<br><div><br= ></div><div>Warner</div></div><br><div class=3D"gmail_quote"><div dir=3D"lt= r" class=3D"gmail_attr">On Fri, May 6, 2022 at 6:50 PM Alan Somers <<a h= ref=3D"mailto:asomers@freebsd.org">asomers@freebsd.org</a>> wrote:<br></= div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor= der-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><di= v>Yes, it can be MFCd.=C2=A0 The only risk I'm aware of is that the 4.4= bsd scheduler might start acting weird - once the load average gets close = to one million.<br><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D= "gmail_attr">On Fri, May 6, 2022, 6:06 PM Kubilay Kocak <<a href=3D"mail= to:koobs@freebsd.org" target=3D"_blank">koobs@freebsd.org</a>> wrote:<br= ></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;= border-left:1px solid rgb(204,204,204);padding-left:1ex">On 7/05/2022 10:04= am, Alan Somers wrote:<br> > The branch main has been updated by asomers:<br> > <br> > URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D1d2421ad8b6d= 508ef155752bdfc5948f7373bac3" rel=3D"noreferrer noreferrer" target=3D"_blan= k">https://cgit.FreeBSD.org/src/commit/?id=3D1d2421ad8b6d508ef155752bdfc594= 8f7373bac3</a><br> > <br> > commit 1d2421ad8b6d508ef155752bdfc5948f7373bac3<br> > Author:=C2=A0 =C2=A0 =C2=A0Alan Somers <asomers@FreeBSD.org><br> > AuthorDate: 2022-05-05 21:35:23 +0000<br> > Commit:=C2=A0 =C2=A0 =C2=A0Alan Somers <asomers@FreeBSD.org><br> > CommitDate: 2022-05-06 23:25:43 +0000<br> > <br> >=C2=A0 =C2=A0 =C2=A0 Correctly measure system load averages > 1024<b= r> >=C2=A0 =C2=A0 =C2=A0 <br> >=C2=A0 =C2=A0 =C2=A0 The old fixed-point arithmetic used for calculatin= g load averages had an<br> >=C2=A0 =C2=A0 =C2=A0 overflow at 1024.=C2=A0 So on systems with extreme= ly high load, the observed<br> >=C2=A0 =C2=A0 =C2=A0 load average would actually fall back to 0 and sho= ot up again, creating<br> >=C2=A0 =C2=A0 =C2=A0 a kind of sawtooth graph.<br> >=C2=A0 =C2=A0 =C2=A0 <br> >=C2=A0 =C2=A0 =C2=A0 Fix this by using 64-bit math internally, while st= ill reporting the load<br> >=C2=A0 =C2=A0 =C2=A0 average to userspace as a 32-bit number.<br> >=C2=A0 =C2=A0 =C2=A0 <br> >=C2=A0 =C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0Axcient<br> >=C2=A0 =C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 imp<br> >=C2=A0 =C2=A0 =C2=A0 Differential Revision: <a href=3D"https://reviews.= freebsd.org/D35134" rel=3D"noreferrer noreferrer" target=3D"_blank">https:/= /reviews.freebsd.org/D35134</a><br> <br> Can MFC?<br> <br> > ---<br> >=C2=A0 =C2=A0sys/kern/kern_synch.c | 9 +++++----<br> >=C2=A0 =C2=A0sys/kern/tty_info.c=C2=A0 =C2=A0| 2 +-<br> >=C2=A0 =C2=A0sys/sys/param.h=C2=A0 =C2=A0 =C2=A0 =C2=A0| 8 ++++----<br> >=C2=A0 =C2=A03 files changed, 10 insertions(+), 9 deletions(-)<br> > <br> > diff --git a/sys/kern/kern_synch.c b/sys/kern/kern_synch.c<br> > index e78878987b57..381d6315044c 100644<br> > --- a/sys/kern/kern_synch.c<br> > +++ b/sys/kern/kern_synch.c<br> > @@ -87,7 +87,7 @@ struct loadavg averunnable =3D<br> >=C2=A0 =C2=A0 * Constants for averages over 1, 5, and 15 minutes<br> >=C2=A0 =C2=A0 * when sampling at 5 second intervals.<br> >=C2=A0 =C2=A0 */<br> > -static fixpt_t cexp[3] =3D {<br> > +static uint64_t cexp[3] =3D {<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A00.9200444146293232 * FSCALE,=C2=A0 =C2=A0 /*= exp(-1/12) */<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A00.9834714538216174 * FSCALE,=C2=A0 =C2=A0 /*= exp(-1/60) */<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A00.9944598480048967 * FSCALE,=C2=A0 =C2=A0 /*= exp(-1/180) */<br> > @@ -611,14 +611,15 @@ setrunnable(struct thread *td, int srqflags)<br> >=C2=A0 =C2=A0static void<br> >=C2=A0 =C2=A0loadav(void *arg)<br> >=C2=A0 =C2=A0{<br> > -=C2=A0 =C2=A0 =C2=A0int i, nrun;<br> > +=C2=A0 =C2=A0 =C2=A0int i;<br> > +=C2=A0 =C2=A0 =C2=A0uint64_t nrun;<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0struct loadavg *avg;<br> >=C2=A0 =C2=A0<br> > -=C2=A0 =C2=A0 =C2=A0nrun =3D sched_load();<br> > +=C2=A0 =C2=A0 =C2=A0nrun =3D (uint64_t)sched_load();<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0avg =3D &averunnable;<br> >=C2=A0 =C2=A0<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < 3; i++)<br> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0avg->ldavg[i] =3D = (cexp[i] * avg->ldavg[i] +<br> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0avg->ldavg[i] =3D = (cexp[i] * (uint64_t)avg->ldavg[i] +<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nr= un * FSCALE * (FSCALE - cexp[i])) >> FSHIFT;<br> >=C2=A0 =C2=A0<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br> > diff --git a/sys/kern/tty_info.c b/sys/kern/tty_info.c<br> > index 60675557e4ed..237aa47a18da 100644<br> > --- a/sys/kern/tty_info.c<br> > +++ b/sys/kern/tty_info.c<br> > @@ -302,7 +302,7 @@ tty_info(struct tty *tp)<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0sbuf_set_drain(&sb, sbuf_tty_drain, tp);= <br> >=C2=A0 =C2=A0<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Print load average. */<br> > -=C2=A0 =C2=A0 =C2=A0load =3D (averunnable.ldavg[0] * 100 + FSCALE / 2= ) >> FSHIFT;<br> > +=C2=A0 =C2=A0 =C2=A0load =3D ((int64_t)averunnable.ldavg[0] * 100 + F= SCALE / 2) >> FSHIFT;<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0sbuf_printf(&sb, "%sload: %d.%02d &= quot;, tp->t_column =3D=3D 0 ? "" : "\n",<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0load / 100, load % 100);<br> >=C2=A0 =C2=A0<br> > diff --git a/sys/sys/param.h b/sys/sys/param.h<br> > index 2d463b9ac7a2..b0b53f1a7776 100644<br> > --- a/sys/sys/param.h<br> > +++ b/sys/sys/param.h<br> > @@ -361,12 +361,12 @@ __END_DECLS<br> >=C2=A0 =C2=A0 * Scale factor for scaled integers used to count %cpu tim= e and load avgs.<br> >=C2=A0 =C2=A0 *<br> >=C2=A0 =C2=A0 * The number of CPU `tick's that map to a unique `%ag= e' can be expressed<br> > - * by the formula (1 / (2 ^ (FSHIFT - 11))).=C2=A0 The maximum load a= verage that<br> > - * can be calculated (assuming 32 bits) can be closely approximated u= sing<br> > - * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15).<br> > + * by the formula (1 / (2 ^ (FSHIFT - 11))).=C2=A0 Since the intermed= iate<br> > + * calculation is done with 64-bit precision, the maximum load averag= e that can<br> > + * be calculated is approximately 2^32 / FSCALE.<br> >=C2=A0 =C2=A0 *<br> >=C2=A0 =C2=A0 * For the scheduler to maintain a 1:1 mapping of CPU `tic= k' to `%age',<br> > - * FSHIFT must be at least 11; this gives us a maximum load avg of ~1= 024.<br> > + * FSHIFT must be at least 11.=C2=A0 This gives a maximum load avg of= 2 million.<br> >=C2=A0 =C2=A0 */<br> >=C2=A0 =C2=A0#define=C2=A0 =C2=A0 =C2=A0FSHIFT=C2=A0 11=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* bits to right of fixed binary point *= /<br> >=C2=A0 =C2=A0#define FSCALE=C2=A0 =C2=A0 =C2=A0 (1<<FSHIFT)<br> > <br> <br> <br> </blockquote></div></div></div> </blockquote></div> --0000000000007b638f05de61765c--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpb0EvgCv=aKZB8P_zUOBHWNnzeW5Q2OSH8w5mGiVa_zg>