Date: Thu, 28 Mar 2024 14:00:58 +0000 From: Nuno Teixeira <eduardo@freebsd.org> To: Drew Gallatin <gallatin@freebsd.org> Cc: Konstantin Belousov <kib@freebsd.org>, rrs <rrs@lakerest.net>, Mike Karels <mike@karels.net>, tuexen <tuexen@freebsd.org>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, Randall Stewart <rrs@freebsd.org> Subject: Re: Request for Testing: TCP RACK Message-ID: <CAFDf7U%2Bsd0Hx6scDOvMuxOQaqVb8wi1ODrwT=uKDCqMn_81QEw@mail.gmail.com> In-Reply-To: <fc160c79-1884-4f68-8310-35e7ac0b9dd6@app.fastmail.com> References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <ZfiY-xUUM3wrBEz_@kib.kiev.ua> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> <Zft8odA0s49eLhvk@kib.kiev.ua> <fc160c79-1884-4f68-8310-35e7ac0b9dd6@app.fastmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000986d170614b8f4a6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello all! Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)! Thanks all! Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 =C3=A0(s) 12:58): > The entire point is to *NOT* go through the overhead of scheduling > something asynchronously, but to take advantage of the fact that a > user/kernel transition is going to trash the cache anyway. > > In the common case of a system which has less than the threshold number > of connections , we access the tcp_hpts_softclock function pointer, make > one function call, and access hpts_that_need_softclock, and then return. > So that's 2 variables and a function call. > > I think it would be preferable to avoid that call, and to move the > declaration of tcp_hpts_softclock and hpts_that_need_softclock so that th= ey > are in the same cacheline. Then we'd be hitting just a single line in th= e > common case. (I've made comments on the review to that effect). > > Also, I wonder if the threshold could get higher by default, so that hpts > is never called in this context unless we're to the point where we're > scheduling thousands of runs of the hpts thread (and taking all those clo= ck > interrupts). > > Drew > > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > > On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > > Ok I have created > > > > https://reviews.freebsd.org/D44420 > > > > > > To address the issue. I also attach a short version of the patch that > Nuno > > can try and validate > > > > it works. Drew you may want to try this and validate the optimization > does > > kick in since I can > > > > only now test that it does not on my local box :) > The patch still causes access to all cpu's cachelines on each userret. > It would be much better to inc/check the threshold and only schedule the > call when exceeded. Then the call can occur in some dedicated context, > like per-CPU thread, instead of userret. > > > > > > > R > > > > > > > > On 3/18/24 3:42 PM, Drew Gallatin wrote: > > > No. The goal is to run on every return to userspace for every thread= . > > > > > > Drew > > > > > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > > > > > I got the idea from > > > > > > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > > > > > The gist is that the TCP pacing stuff needs to run frequently, an= d > > > > > rather than run it out of a clock interrupt, its more efficient t= o > run > > > > > it out of a system call context at just the point where we return > to > > > > > userspace and the cache is trashed anyway. The current > implementation > > > > > is fine for our workload, but probably not idea for a generic > system. > > > > > Especially one where something is banging on system calls. > > > > > > > > > > Ast's could be the right tool for this, but I'm super unfamiliar > with > > > > > them, and I can't find any docs on them. > > > > > > > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent > to > > > > > what's happening here? > > > > This call would need some AST number added, and then it registers t= he > > > > ast to run on next return to userspace, for the current thread. > > > > > > > > Is it enough? > > > > > > > > > > Drew > > > > > > > > > > > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > > > > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > > > > > > > > > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira > > > > <eduardo@freebsd.org> wrote: > > > > > > > >> > > > > > > > >> Hello all! > > > > > > > >> > > > > > > > >> It works just fine! > > > > > > > >> System performance is OK. > > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > > > > > > >> > > > > > > > >> --- > > > > > > > >> net.inet.tcp.functions_available: > > > > > > > >> Stack D > > > > Alias PCB count > > > > > > > >> freebsd freebsd 0 > > > > > > > >> rack * > > > > rack 38 > > > > > > > >> --- > > > > > > > >> > > > > > > > >> It would be so nice that we can have a sysctl tunnable for > > > > this patch > > > > > > > >> so we could do more tests without recompiling kernel. > > > > > > > > Thanks for testing! > > > > > > > > > > > > > > > > @gallatin: can you come up with a patch that is acceptable > > > > for Netflix > > > > > > > > and allows to mitigate the performance regression. > > > > > > > > > > > > > > Ideally, tcphpts could enable this automatically when it > > > > starts to be > > > > > > > used (enough?), but a sysctl could select auto/on/off. > > > > > > There is already a well-known mechanism to request execution of > the > > > > > > specific function on return to userspace, namely AST. The > difference > > > > > > with the current hack is that the execution is requested for on= e > > > > callback > > > > > > in the context of the specific thread. > > > > > > > > > > > > Still, it might be worth a try to use it; what is the reason to > > > > hit a thread > > > > > > that does not do networking, with TCP processing? > > > > > > > > > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > Best regards > > > > > > > > Michael > > > > > > > >> > > > > > > > >> Thanks all! > > > > > > > >> Really happy here :) > > > > > > > >> > > > > > > > >> Cheers, > > > > > > > >> > > > > > > > >> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, > > > > 17/03/2024 =C3=A0(s) 20:26): > > > > > > > >>> > > > > > > > >>> Hello, > > > > > > > >>> > > > > > > > >>>> I don't have the full context, but it seems like the > > > > complaint is a performance regression in bonnie++ and perhaps other > > > > things when tcp_hpts is loaded, even when it is not used. Is that > > > > correct? > > > > > > > >>>> > > > > > > > >>>> If so, I suspect its because we drive the > > > > tcp_hpts_softclock() routine from userret(), in order to avoid tons > > > > of timer interrupts and context switches. To test this theory, yo= u > > > > could apply a patch like: > > > > > > > >>> > > > > > > > >>> It's affecting overall system performance, bonnie was jus= t > > > > a way to > > > > > > > >>> get some numbers to compare. > > > > > > > >>> > > > > > > > >>> Tomorrow I will test patch. > > > > > > > >>> > > > > > > > >>> Thanks! > > > > > > > >>> > > > > > > > >>> -- > > > > > > > >>> Nuno Teixeira > > > > > > > >>> FreeBSD Committer (ports) > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Nuno Teixeira > > > > > > > >> FreeBSD Committer (ports) > > > > > > > > > > > > > > > > > > > > > > > diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > > index 8c4d2d41a3eb..eadbee19f69c 100644 > > --- a/sys/netinet/tcp_hpts.c > > +++ b/sys/netinet/tcp_hpts.c > > @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > > void *ie_cookie; > > uint16_t p_num; /* The hpts number one per cpu */ > > uint16_t p_cpu; /* The hpts CPU */ > > + uint8_t hit_callout_thresh; > > /* There is extra space in here */ > > /* Cache line 0x100 */ > > struct callout co __aligned(CACHE_LINE_SIZE); > > @@ -269,6 +270,11 @@ static struct hpts_domain_info { > > int cpu[MAXCPU]; > > } hpts_domains[MAXMEMDOM]; > > > > +counter_u64_t hpts_that_need_softclock; > > +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock, > CTLFLAG_RD, > > + &hpts_that_need_softclock, > > + "Number of hpts threads that need softclock"); > > + > > counter_u64_t hpts_hopelessly_behind; > > > > SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, > CTLFLAG_RD, > > @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, precision, > CTLFLAG_RW, > > &tcp_hpts_precision, 120, > > "Value for PRE() precision of callout"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > > - &conn_cnt_thresh, 0, > > + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > > "How many connections (below) make us use the callout based > mechanism"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > > &hpts_does_tp_logging, 0, > > @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > > struct tcp_hpts_entry *hpts; > > int ticks_ran; > > > > + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0) > > + return; > > + > > hpts =3D tcp_choose_hpts_to_run(); > > > > if (hpts->p_hpts_active) { > > @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > > ticks_ran =3D tcp_hptsi(hpts, 1); > > tv.tv_sec =3D 0; > > tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > > + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 0)) { > > + hpts->hit_callout_thresh =3D 1; > > + counter_u64_add(hpts_that_need_softclock, 1); > > + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 1)) { > > + hpts->hit_callout_thresh =3D 0; > > + counter_u64_add(hpts_that_need_softclock, -1); > > + } > > if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) { > > if(hpts->p_direct_wake =3D=3D 0) { > > /* > > @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > > cpu_top =3D NULL; > > #endif > > tcp_pace.rp_num_hptss =3D ncpus; > > + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK); > > hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); > > hpts_loops =3D counter_u64_alloc(M_WAITOK); > > back_tosleep =3D counter_u64_alloc(M_WAITOK); > > @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > > free(tcp_pace.grps, M_TCPHPTS); > > #endif > > > > + counter_u64_free(hpts_that_need_softclock); > > counter_u64_free(hpts_hopelessly_behind); > > counter_u64_free(hpts_loops); > > counter_u64_free(back_tosleep); > > > > --=20 Nuno Teixeira FreeBSD Committer (ports) --000000000000986d170614b8f4a6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hello all!</div><div><br></div><div>Running rack @b7b= 78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)!</div= ><div><br></div><div>Thanks all!<br></div></div><br><div class=3D"gmail_quo= te"><div dir=3D"ltr" class=3D"gmail_attr">Drew Gallatin <<a href=3D"mail= to:gallatin@freebsd.org">gallatin@freebsd.org</a>> escreveu (quinta, 21/= 03/2024 =C3=A0(s) 12:58):<br></div><blockquote class=3D"gmail_quote" style= =3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding= -left:1ex"><div class=3D"msg3886066866197344175"><u></u><div><div>The entir= e point is to *NOT* go through the overhead of scheduling something asynchr= onously, but to take advantage of the fact that a user/kernel transition is= going to trash the cache anyway.<br></div><div><br></div><div>In the commo= n case of a system which has less than the threshold=C2=A0 number of connec= tions , we access the tcp_hpts_softclock function pointer, make one functio= n call, and access hpts_that_need_softclock, and then return.=C2=A0 So that= 's 2 variables and a function call.<br></div><div><br></div><div>I thin= k it would be preferable to avoid that call, and to move the declaration of= tcp_hpts_softclock and hpts_that_need_softclock so that they are in the sa= me cacheline.=C2=A0 Then we'd be hitting just a single line in the comm= on case.=C2=A0 (I've made comments on the review to that effect).<br></= div><div><br></div><div>Also, I wonder if the threshold could get higher by= default, so that hpts is never called in this context unless we're to = the point where we're scheduling thousands of runs of the hpts thread (= and taking all those clock interrupts).<br></div><div><br></div><div>Drew<b= r></div><div><br></div><div>On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Be= lousov wrote:<br></div><blockquote type=3D"cite" id=3D"m_388606686619734417= 5qt"><div>On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:<br></div><di= v>> Ok I have created<br></div><div>>=C2=A0<br></div><div>>=C2=A0<= a href=3D"https://reviews.freebsd.org/D44420" target=3D"_blank">https://rev= iews.freebsd.org/D44420</a><br></div><div>>=C2=A0<br></div><div>>=C2= =A0<br></div><div>> To address the issue. I also attach a short version = of the patch that Nuno<br></div><div>> can try and validate<br></div><di= v>>=C2=A0<br></div><div>> it works. Drew you may want to try this and= validate the optimization does<br></div><div>> kick in since I can<br><= /div><div>>=C2=A0<br></div><div>> only now test that it does not on m= y local box :)<br></div><div>The patch still causes access to all cpu's= cachelines on each userret.<br></div><div>It would be much better to inc/c= heck the threshold and only schedule the<br></div><div>call when exceeded.= =C2=A0 Then the call can occur in some dedicated context,<br></div><div>lik= e per-CPU thread, instead of userret.<br></div><div><br></div><div>>=C2= =A0<br></div><div>>=C2=A0<br></div><div>> R<br></div><div>>=C2=A0<= br></div><div>>=C2=A0<br></div><div>>=C2=A0<br></div><div>> On 3/1= 8/24 3:42 PM, Drew Gallatin wrote:<br></div><div>> > No.=C2=A0 The go= al is to run on every return to userspace for every thread.<br></div><div>&= gt; >=C2=A0<br></div><div>> > Drew<br></div><div>> >=C2=A0<b= r></div><div>> > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belouso= v wrote:<br></div><div>> > > On Mon, Mar 18, 2024 at 03:13:11PM -0= 400, Drew Gallatin wrote:<br></div><div>> > > > I got the idea = from<br></div><div>> > > >=C2=A0<a href=3D"https://people.mpi-s= ws.org/~druschel/publications/soft-timers-tocs.pdf" target=3D"_blank">https= ://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf</a><br></= div><div>> > > > The gist is that the TCP pacing stuff needs to= run frequently, and<br></div><div>> > > > rather than run it o= ut of a clock interrupt, its more efficient to run<br></div><div>> > = > > it out of a system call context at just the point where we return= to<br></div><div>> > > > userspace and the cache is trashed an= yway. The current implementation<br></div><div>> > > > is fine = for our workload, but probably not idea for a generic system.<br></div><div= >> > > > Especially one where something is banging on system ca= lls.<br></div><div>> > > ><br></div><div>> > > > As= t's could be the right tool for this, but I'm super unfamiliar with= <br></div><div>> > > > them, and I can't find any docs on t= hem.<br></div><div>> > > ><br></div><div>> > > > Wo= uld ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to<br></div= ><div>> > > > what's happening here?<br></div><div>> >= ; > This call would need some AST number added, and then it registers th= e<br></div><div>> > > ast to run on next return to userspace, for = the current thread.<br></div><div>> > >=C2=A0<br></div><div>> &= gt; > Is it enough?<br></div><div>> > > ><br></div><div>>= > > > Drew<br></div><div>> > >=C2=A0<br></div><div>> = > > ><br></div><div>> > > > On Mon, Mar 18, 2024, at 2= :33 PM, Konstantin Belousov wrote:<br></div><div>> > > > > O= n Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:<br></div><div>&= gt; > > > > > On 18 Mar 2024, at 7:04,=C2=A0<a href=3D"mailt= o:tuexen@freebsd.org" target=3D"_blank">tuexen@freebsd.org</a> wrote:<br></= div><div>> > > > > ><br></div><div>> > > > &g= t; > >> On 18. Mar 2024, at 12:42, Nuno Teixeira<br></div><div>>= ; > > <<a href=3D"mailto:eduardo@freebsd.org" target=3D"_blank">ed= uardo@freebsd.org</a>> wrote:<br></div><div>> > > > > >= ; >><br></div><div>> > > > > > >> Hello all!<= br></div><div>> > > > > > >><br></div><div>> >= ; > > > > >> It works just fine!<br></div><div>> > = > > > > >> System performance is OK.<br></div><div>> &= gt; > > > > >> Using patch on main-n268841-b0aaf8beb126(-= dirty).<br></div><div>> > > > > > >><br></div><div>= > > > > > > >> ---<br></div><div>> > > >= ; > > >> net.inet.tcp.functions_available:<br></div><div>> &= gt; > > > > >> Stack=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 D<br></div><div>> > >= ; Alias=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 PCB count<br></div><div>> > > > > >= >> freebsd freebsd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0<br></div><div>> > > > > > &= gt;> rack=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *<br></div><div>> > > rack=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 38<br></div><div>> > > > > > >> ---<br></div><d= iv>> > > > > > >><br></div><div>> > > >= > > >> It would be so nice that we can have a sysctl tunnable = for<br></div><div>> > > this patch<br></div><div>> > > &g= t; > > >> so we could do more tests without recompiling kernel.= <br></div><div>> > > > > > > Thanks for testing!<br></= div><div>> > > > > > ><br></div><div>> > > &g= t; > > > @gallatin: can you come up with a patch that is acceptabl= e<br></div><div>> > > for Netflix<br></div><div>> > > >= ; > > > and allows to mitigate the performance regression.<br></di= v><div>> > > > > ><br></div><div>> > > > >= > Ideally, tcphpts could enable this automatically when it<br></div><di= v>> > > starts to be<br></div><div>> > > > > > u= sed (enough?), but a sysctl could select auto/on/off.<br></div><div>> &g= t; > > > There is already a well-known mechanism to request execut= ion of the<br></div><div>> > > > > specific function on retu= rn to userspace, namely AST.=C2=A0 The difference<br></div><div>> > &= gt; > > with the current hack is that the execution is requested for = one<br></div><div>> > > callback<br></div><div>> > > >= > in the context of the specific thread.<br></div><div>> > > &= gt; ><br></div><div>> > > > > Still, it might be worth a = try to use it; what is the reason to<br></div><div>> > > hit a thr= ead<br></div><div>> > > > > that does not do networking, wit= h TCP processing?<br></div><div>> > > > ><br></div><div>>= > > > > ><br></div><div>> > > > > > Mike<= br></div><div>> > > > > ><br></div><div>> > > &g= t; > > > Best regards<br></div><div>> > > > > > = > Michael<br></div><div>> > > > > > >><br></div>= <div>> > > > > > >> Thanks all!<br></div><div>> = > > > > > >> Really happy here :)<br></div><div>> &= gt; > > > > >><br></div><div>> > > > > >= ; >> Cheers,<br></div><div>> > > > > > >><br>= </div><div>> > > > > > >> Nuno Teixeira <<a href= =3D"mailto:eduardo@freebsd.org" target=3D"_blank">eduardo@freebsd.org</a>&g= t; escreveu (domingo,<br></div><div>> > > 17/03/2024 =C3=A0(s) 20:= 26):<br></div><div>> > > > > > >>><br></div><div= >> > > > > > >>> Hello,<br></div><div>> > = > > > > >>><br></div><div>> > > > > >= ; >>>> I don't have the full context, but it seems like the= <br></div><div>> > > complaint is a performance regression in bonn= ie++ and perhaps other<br></div><div>> > > things when tcp_hpts is= loaded, even when it is not used.=C2=A0 Is that<br></div><div>> > &g= t; correct?<br></div><div>> > > > > > >>>><br= ></div><div>> > > > > > >>>> If so, I suspect= its because we drive the<br></div><div>> > > tcp_hpts_softclock()= routine from userret(), in order to avoid tons<br></div><div>> > >= ; of timer interrupts and context switches.=C2=A0 To test this theory,=C2= =A0 you<br></div><div>> > > could apply a patch like:<br></div><di= v>> > > > > > >>><br></div><div>> > > &= gt; > > >>> It's affecting overall system performance, b= onnie was just<br></div><div>> > > a way to<br></div><div>> >= ; > > > > >>> get some numbers to compare.<br></div><d= iv>> > > > > > >>><br></div><div>> > > = > > > >>> Tomorrow I will test patch.<br></div><div>> = > > > > > >>><br></div><div>> > > > >= ; > >>> Thanks!<br></div><div>> > > > > > >= ;>><br></div><div>> > > > > > >>> --<br></= div><div>> > > > > > >>> Nuno Teixeira<br></div>= <div>> > > > > > >>> FreeBSD Committer (ports)<b= r></div><div>> > > > > > >><br></div><div>> >= > > > > >><br></div><div>> > > > > > &= gt;><br></div><div>> > > > > > >> --<br></div><d= iv>> > > > > > >> Nuno Teixeira<br></div><div>> = > > > > > >> FreeBSD Committer (ports)<br></div><div>&= gt; > > > > ><br></div><div>> > > > ><br></di= v><div>> > >=C2=A0<br></div><div>> >=C2=A0<br></div><div><br= ></div><div>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts= .c<br></div><div>> index 8c4d2d41a3eb..eadbee19f69c 100644<br></div><div= >> --- a/sys/netinet/tcp_hpts.c<br></div><div>> +++ b/sys/netinet/tcp= _hpts.c<br></div><div>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {<br><= /div><div>>=C2=A0 void *ie_cookie;<br></div><div>>=C2=A0 uint16_t p= _num; /* The hpts number one per cpu */<br></div><div>>=C2=A0 uint16_t= p_cpu; /* The hpts CPU */<br></div><div>> + uint8_t hit_callout_thresh= ;<br></div><div>>=C2=A0 /* There is extra space in here */<br></div><di= v>>=C2=A0 /* Cache line 0x100 */<br></div><div>>=C2=A0 struct callo= ut co __aligned(CACHE_LINE_SIZE);<br></div><div>> @@ -269,6 +270,11 @@ s= tatic struct hpts_domain_info {<br></div><div>>=C2=A0 int cpu[MAXCPU];<= br></div><div>>=C2=A0 } hpts_domains[MAXMEMDOM];<br></div><div>>=C2= =A0=C2=A0<br></div><div>> +counter_u64_t hpts_that_need_softclock;<br></= div><div>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needs= oftclock, CTLFLAG_RD,<br></div><div>> +=C2=A0=C2=A0=C2=A0 &hpts_that= _need_softclock,<br></div><div>> +=C2=A0=C2=A0=C2=A0 "Number of hpt= s threads that need softclock");<br></div><div>> +<br></div><div>&g= t;=C2=A0 counter_u64_t hpts_hopelessly_behind;<br></div><div>>=C2=A0=C2= =A0<br></div><div>>=C2=A0 SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, O= ID_AUTO, hopeless, CTLFLAG_RD,<br></div><div>> @@ -334,7 +340,7 @@ SYSCT= L_INT(_net_inet_tcp_hpts, OID_AUTO, precision, CTLFLAG_RW,<br></div><div>&g= t;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &tcp_hpts_precision, 120,<br></div><di= v>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "Value for PRE() precision of cal= lout");<br></div><div>>=C2=A0 SYSCTL_INT(_net_inet_tcp_hpts, OID_AU= TO, cnt_thresh, CTLFLAG_RW,<br></div><div>> -=C2=A0=C2=A0=C2=A0 &con= n_cnt_thresh, 0,<br></div><div>> +=C2=A0=C2=A0=C2=A0 &conn_cnt_thres= h, DEFAULT_CONNECTION_THESHOLD,<br></div><div>>=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 "How many connections (below) make us use the callout based mec= hanism");<br></div><div>>=C2=A0 SYSCTL_INT(_net_inet_tcp_hpts, OID_= AUTO, logging, CTLFLAG_RW,<br></div><div>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= &hpts_does_tp_logging, 0,<br></div><div>> @@ -1548,6 +1554,9 @@ __t= cp_run_hpts(void)<br></div><div>>=C2=A0 struct tcp_hpts_entry *hpts;<br= ></div><div>>=C2=A0 int ticks_ran;<br></div><div>>=C2=A0=C2=A0<br></= div><div>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0)<b= r></div><div>> + return;<br></div><div>> +<br></div><div>>=C2=A0 = hpts =3D tcp_choose_hpts_to_run();<br></div><div>>=C2=A0=C2=A0<br></div= ><div>>=C2=A0 if (hpts->p_hpts_active) {<br></div><div>> @@ -1683= ,6 +1692,13 @@ tcp_hpts_thread(void *ctx)<br></div><div>>=C2=A0 ticks_r= an =3D tcp_hptsi(hpts, 1);<br></div><div>>=C2=A0 tv.tv_sec =3D 0;<br></= div><div>>=C2=A0 tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS= _PER_SLOT;<br></div><div>> + if ((hpts->p_on_queue_cnt > conn_cnt_= thresh) && (hpts->hit_callout_thresh =3D=3D 0)) {<br></div><div>= > + hpts->hit_callout_thresh =3D 1;<br></div><div>> + counter_u6= 4_add(hpts_that_need_softclock, 1);<br></div><div>> + } else if ((hpts-&= gt;p_on_queue_cnt <=3D conn_cnt_thresh) && (hpts->hit_callout= _thresh =3D=3D 1)) {<br></div><div>> + hpts->hit_callout_thresh =3D = 0;<br></div><div>> + counter_u64_add(hpts_that_need_softclock, -1);<br>= </div><div>> + }<br></div><div>>=C2=A0 if (hpts->p_on_queue_cnt &= gt;=3D conn_cnt_thresh) {<br></div><div>>=C2=A0 if(hpts->p_direct_w= ake =3D=3D 0) {<br></div><div>>=C2=A0 /*<br></div><div>> @@ -1818,= 6 +1834,7 @@ tcp_hpts_mod_load(void)<br></div><div>>=C2=A0 cpu_top =3D = NULL;<br></div><div>>=C2=A0 #endif<br></div><div>>=C2=A0 tcp_pace.rp= _num_hptss =3D ncpus;<br></div><div>> + hpts_that_need_softclock =3D cou= nter_u64_alloc(M_WAITOK);<br></div><div>>=C2=A0 hpts_hopelessly_behind = =3D counter_u64_alloc(M_WAITOK);<br></div><div>>=C2=A0 hpts_loops =3D c= ounter_u64_alloc(M_WAITOK);<br></div><div>>=C2=A0 back_tosleep =3D coun= ter_u64_alloc(M_WAITOK);<br></div><div>> @@ -2042,6 +2059,7 @@ tcp_hpts_= mod_unload(void)<br></div><div>>=C2=A0 free(tcp_pace.grps, M_TCPHPTS);<= br></div><div>>=C2=A0 #endif<br></div><div>>=C2=A0=C2=A0<br></div><di= v>> + counter_u64_free(hpts_that_need_softclock);<br></div><div>>=C2= =A0 counter_u64_free(hpts_hopelessly_behind);<br></div><div>>=C2=A0 co= unter_u64_free(hpts_loops);<br></div><div>>=C2=A0 counter_u64_free(back= _tosleep);<br></div><div><br></div><div><br></div></blockquote><div><br></d= iv></div></div></blockquote></div><br clear=3D"all"><br><span class=3D"gmai= l_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature= "><div dir=3D"ltr"><span style=3D"color:rgb(102,102,102)">Nuno Teixeira<br>= FreeBSD Committer (ports)</span></div></div> --000000000000986d170614b8f4a6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFDf7U%2Bsd0Hx6scDOvMuxOQaqVb8wi1ODrwT=uKDCqMn_81QEw>