Date: Wed, 10 Apr 2024 12:40:16 +0100 From: Nuno Teixeira <eduardo@freebsd.org> To: tuexen@freebsd.org Cc: Drew Gallatin <gallatin@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, rrs <rrs@lakerest.net>, Mike Karels <mike@karels.net>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, Randall Stewart <rrs@freebsd.org> Subject: Re: Request for Testing: TCP RACK Message-ID: <CAFDf7U%2BPtNhVsbR=pMhGVnxW5thWmzYBq9%2Bd4rSbZD0zRnJjkg@mail.gmail.com> In-Reply-To: <5C9863F7-0F1C-4D02-9F6D-9DDC5FBEB368@freebsd.org> References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <ZfiY-xUUM3wrBEz_@kib.kiev.ua> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> <Zft8odA0s49eLhvk@kib.kiev.ua> <fc160c79-1884-4f68-8310-35e7ac0b9dd6@app.fastmail.com> <CAFDf7U%2Bsd0Hx6scDOvMuxOQaqVb8wi1ODrwT=uKDCqMn_81QEw@mail.gmail.com> <5C9863F7-0F1C-4D02-9F6D-9DDC5FBEB368@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000004c6d020615bc81cf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello all, @ current 1500018 and fetching torrents with net-p2p/qbittorrent finished ~2GB download and connection UP until the end: --- Apr 10 11:26:46 leg kernel: re0: watchdog timeout Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67 Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0 Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): 192.168.1.255 Apr 10 11:26:49 leg kernel: re0: link state changed to UP Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1 --- In the past tests, I've got more watchdog timeouts, connection goes down and a reboot needed to put it back (`service netif restart` didn't work). Other way to reproduce this is using sysutils/restic (backup program) to read/check all files from a remote server via sftp: `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB compressed backup. --- watchdog timeout x3 as above --- restic check fail log @ 15% progress: --- <snip> Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after 1.7670599s: connection lost Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after 4.619104908s: connection lost Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after 5.477648517s: connection lost List(lock) returned error, retrying after 293.057766ms: connection lost List(lock) returned error, retrying after 385.206693ms: connection lost List(lock) returned error, retrying after 1.577594281s: connection lost <snip> Connection continues UP. Cheers, <tuexen@freebsd.org> escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53): > > On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote: > > > > Hello all! > > > > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop > (amd64)! > > > > Thanks all! > Thanks for the feedback! > > Best regards > Michael > > > > Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 =C3= =A0(s) > 12:58): > > The entire point is to *NOT* go through the overhead of scheduling > something asynchronously, but to take advantage of the fact that a > user/kernel transition is going to trash the cache anyway. > > > > In the common case of a system which has less than the threshold numbe= r > of connections , we access the tcp_hpts_softclock function pointer, make > one function call, and access hpts_that_need_softclock, and then return. > So that's 2 variables and a function call. > > > > I think it would be preferable to avoid that call, and to move the > declaration of tcp_hpts_softclock and hpts_that_need_softclock so that th= ey > are in the same cacheline. Then we'd be hitting just a single line in th= e > common case. (I've made comments on the review to that effect). > > > > Also, I wonder if the threshold could get higher by default, so that > hpts is never called in this context unless we're to the point where we'r= e > scheduling thousands of runs of the hpts thread (and taking all those clo= ck > interrupts). > > > > Drew > > > > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > >>> Ok I have created > >>> > >>> https://reviews.freebsd.org/D44420 > >>> > >>> > >>> To address the issue. I also attach a short version of the patch that > Nuno > >>> can try and validate > >>> > >>> it works. Drew you may want to try this and validate the optimization > does > >>> kick in since I can > >>> > >>> only now test that it does not on my local box :) > >> The patch still causes access to all cpu's cachelines on each userret. > >> It would be much better to inc/check the threshold and only schedule t= he > >> call when exceeded. Then the call can occur in some dedicated context= , > >> like per-CPU thread, instead of userret. > >> > >>> > >>> > >>> R > >>> > >>> > >>> > >>> On 3/18/24 3:42 PM, Drew Gallatin wrote: > >>>> No. The goal is to run on every return to userspace for every threa= d. > >>>> > >>>> Drew > >>>> > >>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > >>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > >>>>>> I got the idea from > >>>>>> > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > >>>>>> The gist is that the TCP pacing stuff needs to run frequently, and > >>>>>> rather than run it out of a clock interrupt, its more efficient to > run > >>>>>> it out of a system call context at just the point where we return = to > >>>>>> userspace and the cache is trashed anyway. The current > implementation > >>>>>> is fine for our workload, but probably not idea for a generic > system. > >>>>>> Especially one where something is banging on system calls. > >>>>>> > >>>>>> Ast's could be the right tool for this, but I'm super unfamiliar > with > >>>>>> them, and I can't find any docs on them. > >>>>>> > >>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent = to > >>>>>> what's happening here? > >>>>> This call would need some AST number added, and then it registers t= he > >>>>> ast to run on next return to userspace, for the current thread. > >>>>> > >>>>> Is it enough? > >>>>>> > >>>>>> Drew > >>>>> > >>>>>> > >>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > >>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > >>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > >>>>>>>> > >>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira > >>>>> <eduardo@freebsd.org> wrote: > >>>>>>>>>> > >>>>>>>>>> Hello all! > >>>>>>>>>> > >>>>>>>>>> It works just fine! > >>>>>>>>>> System performance is OK. > >>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty). > >>>>>>>>>> > >>>>>>>>>> --- > >>>>>>>>>> net.inet.tcp.functions_available: > >>>>>>>>>> Stack D > >>>>> Alias PCB count > >>>>>>>>>> freebsd freebsd 0 > >>>>>>>>>> rack * > >>>>> rack 38 > >>>>>>>>>> --- > >>>>>>>>>> > >>>>>>>>>> It would be so nice that we can have a sysctl tunnable for > >>>>> this patch > >>>>>>>>>> so we could do more tests without recompiling kernel. > >>>>>>>>> Thanks for testing! > >>>>>>>>> > >>>>>>>>> @gallatin: can you come up with a patch that is acceptable > >>>>> for Netflix > >>>>>>>>> and allows to mitigate the performance regression. > >>>>>>>> > >>>>>>>> Ideally, tcphpts could enable this automatically when it > >>>>> starts to be > >>>>>>>> used (enough?), but a sysctl could select auto/on/off. > >>>>>>> There is already a well-known mechanism to request execution of t= he > >>>>>>> specific function on return to userspace, namely AST. The > difference > >>>>>>> with the current hack is that the execution is requested for one > >>>>> callback > >>>>>>> in the context of the specific thread. > >>>>>>> > >>>>>>> Still, it might be worth a try to use it; what is the reason to > >>>>> hit a thread > >>>>>>> that does not do networking, with TCP processing? > >>>>>>> > >>>>>>>> > >>>>>>>> Mike > >>>>>>>> > >>>>>>>>> Best regards > >>>>>>>>> Michael > >>>>>>>>>> > >>>>>>>>>> Thanks all! > >>>>>>>>>> Really happy here :) > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, > >>>>> 17/03/2024 =C3=A0(s) 20:26): > >>>>>>>>>>> > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>>> I don't have the full context, but it seems like the > >>>>> complaint is a performance regression in bonnie++ and perhaps other > >>>>> things when tcp_hpts is loaded, even when it is not used. Is that > >>>>> correct? > >>>>>>>>>>>> > >>>>>>>>>>>> If so, I suspect its because we drive the > >>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons > >>>>> of timer interrupts and context switches. To test this theory, yo= u > >>>>> could apply a patch like: > >>>>>>>>>>> > >>>>>>>>>>> It's affecting overall system performance, bonnie was just > >>>>> a way to > >>>>>>>>>>> get some numbers to compare. > >>>>>>>>>>> > >>>>>>>>>>> Tomorrow I will test patch. > >>>>>>>>>>> > >>>>>>>>>>> Thanks! > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Nuno Teixeira > >>>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Nuno Teixeira > >>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >> > >>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > >>> index 8c4d2d41a3eb..eadbee19f69c 100644 > >>> --- a/sys/netinet/tcp_hpts.c > >>> +++ b/sys/netinet/tcp_hpts.c > >>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > >>> void *ie_cookie; > >>> uint16_t p_num; /* The hpts number one per cpu */ > >>> uint16_t p_cpu; /* The hpts CPU */ > >>> + uint8_t hit_callout_thresh; > >>> /* There is extra space in here */ > >>> /* Cache line 0x100 */ > >>> struct callout co __aligned(CACHE_LINE_SIZE); > >>> @@ -269,6 +270,11 @@ static struct hpts_domain_info { > >>> int cpu[MAXCPU]; > >>> } hpts_domains[MAXMEMDOM]; > >>> > >>> +counter_u64_t hpts_that_need_softclock; > >>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock= , > CTLFLAG_RD, > >>> + &hpts_that_need_softclock, > >>> + "Number of hpts threads that need softclock"); > >>> + > >>> counter_u64_t hpts_hopelessly_behind; > >>> > >>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, > CTLFLAG_RD, > >>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, > precision, CTLFLAG_RW, > >>> &tcp_hpts_precision, 120, > >>> "Value for PRE() precision of callout"); > >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > >>> - &conn_cnt_thresh, 0, > >>> + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > >>> "How many connections (below) make us use the callout based > mechanism"); > >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > >>> &hpts_does_tp_logging, 0, > >>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > >>> struct tcp_hpts_entry *hpts; > >>> int ticks_ran; > >>> > >>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0) > >>> + return; > >>> + > >>> hpts =3D tcp_choose_hpts_to_run(); > >>> > >>> if (hpts->p_hpts_active) { > >>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > >>> ticks_ran =3D tcp_hptsi(hpts, 1); > >>> tv.tv_sec =3D 0; > >>> tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > >>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 0)) { > >>> + hpts->hit_callout_thresh =3D 1; > >>> + counter_u64_add(hpts_that_need_softclock, 1); > >>> + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 1)) { > >>> + hpts->hit_callout_thresh =3D 0; > >>> + counter_u64_add(hpts_that_need_softclock, -1); > >>> + } > >>> if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) { > >>> if(hpts->p_direct_wake =3D=3D 0) { > >>> /* > >>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > >>> cpu_top =3D NULL; > >>> #endif > >>> tcp_pace.rp_num_hptss =3D ncpus; > >>> + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK); > >>> hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); > >>> hpts_loops =3D counter_u64_alloc(M_WAITOK); > >>> back_tosleep =3D counter_u64_alloc(M_WAITOK); > >>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > >>> free(tcp_pace.grps, M_TCPHPTS); > >>> #endif > >>> > >>> + counter_u64_free(hpts_that_need_softclock); > >>> counter_u64_free(hpts_hopelessly_behind); > >>> counter_u64_free(hpts_loops); > >>> counter_u64_free(back_tosleep); > >> > >> > > > > > > > > -- > > Nuno Teixeira > > FreeBSD Committer (ports) > > --=20 Nuno Teixeira FreeBSD Committer (ports) --0000000000004c6d020615bc81cf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Hello all,</div><div><br></div><div>@ current 1500018= and fetching torrents with net-p2p/qbittorrent finished ~2GB download and = connection UP until the end: <br></div><div><br></div><div>---<br></div><di= v>Apr 10 11:26:46 leg kernel: re0: watchdog timeout<br>Apr 10 11:26:46 leg = kernel: re0: link state changed to DOWN<br>Apr 10 11:26:49 leg dhclient[588= 10]: New IP Address (re0): 192.168.1.67<br>Apr 10 11:26:49 leg dhclient[588= 14]: New Subnet Mask (re0): 255.255.255.0<br>Apr 10 11:26:49 leg dhclient[5= 8818]: New Broadcast Address (re0): 192.168.1.255<br>Apr 10 11:26:49 leg ke= rnel: re0: link state changed to UP<br>Apr 10 11:26:49 leg dhclient[58822]:= New Routers (re0): 192.168.1.1</div><div>---</div><div><br></div><div>In t= he past tests, I've got more watchdog timeouts, connection goes down an= d a reboot needed to put it back (`service netif restart` didn't work).= </div><div><br></div><div>Other way to reproduce this is using sysutils/res= tic (backup program) to read/check all files from a remote server via sftp:= </div><div><br></div><div>`restic -r sftp:user@remote:restic-repo check --r= ead-data` from a 60GB compressed backup.</div><div><br></div><div>---</div>= <div>watchdog timeout x3 as above</div><div>---</div><div><br></div><div>re= stic check fail log @ 15% progress:<br></div><div>---</div><div><snip>= ;<br></div><div>Load(<data/52e2923dd6>, 17310001, 0) returned error, = retrying after 1.7670599s: connection lost<br>Load(<data/d27a0abe0f>,= 17456892, 0) returned error, retrying after 4.619104908s: connection lost<= br>Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying afte= r 5.477648517s: connection lost<br>List(lock) returned error, retrying afte= r 293.057766ms: connection lost<br>List(lock) returned error, retrying afte= r 385.206693ms: connection lost<br>List(lock) returned error, retrying afte= r 1.577594281s: connection lost</div><div><snip></div><div><br></div>= <div>Connection continues UP.</div><div><br></div><div>Cheers,<br></div></d= iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr"><= ;<a href=3D"mailto:tuexen@freebsd.org" target=3D"_blank">tuexen@freebsd.org= </a>> escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53):<br></div><blockquot= e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s= olid rgb(204,204,204);padding-left:1ex">> On 28. Mar 2024, at 15:00, Nun= o Teixeira <<a href=3D"mailto:eduardo@freebsd.org" target=3D"_blank">edu= ardo@freebsd.org</a>> wrote:<br> > <br> > Hello all!<br> > <br> > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on m= y laptop (amd64)!<br> > <br> > Thanks all!<br> Thanks for the feedback!<br> <br> Best regards<br> Michael<br> > <br> > Drew Gallatin <<a href=3D"mailto:gallatin@freebsd.org" target=3D"_b= lank">gallatin@freebsd.org</a>> escreveu (quinta, 21/03/2024 =C3=A0(s) 1= 2:58):<br> > The entire point is to *NOT* go through the overhead of scheduling som= ething asynchronously, but to take advantage of the fact that a user/kernel= transition is going to trash the cache anyway.<br> > <br> > In the common case of a system which has less than the threshold=C2=A0= number of connections , we access the tcp_hpts_softclock function pointer,= make one function call, and access hpts_that_need_softclock, and then retu= rn.=C2=A0 So that's 2 variables and a function call.<br> > <br> > I think it would be preferable to avoid that call, and to move the dec= laration of tcp_hpts_softclock and hpts_that_need_softclock so that they ar= e in the same cacheline.=C2=A0 Then we'd be hitting just a single line = in the common case.=C2=A0 (I've made comments on the review to that eff= ect).<br> > <br> > Also, I wonder if the threshold could get higher by default, so that h= pts is never called in this context unless we're to the point where we&= #39;re scheduling thousands of runs of the hpts thread (and taking all thos= e clock interrupts).<br> > <br> > Drew<br> > <br> > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:<br> >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:<br> >>> Ok I have created<br> >>> <br> >>> <a href=3D"https://reviews.freebsd.org/D44420" rel=3D"noreferr= er" target=3D"_blank">https://reviews.freebsd.org/D44420</a><br> >>> <br> >>> <br> >>> To address the issue. I also attach a short version of the pat= ch that Nuno<br> >>> can try and validate<br> >>> <br> >>> it works. Drew you may want to try this and validate the optim= ization does<br> >>> kick in since I can<br> >>> <br> >>> only now test that it does not on my local box :)<br> >> The patch still causes access to all cpu's cachelines on each = userret.<br> >> It would be much better to inc/check the threshold and only schedu= le the<br> >> call when exceeded.=C2=A0 Then the call can occur in some dedicate= d context,<br> >> like per-CPU thread, instead of userret.<br> >> <br> >>> <br> >>> <br> >>> R<br> >>> <br> >>> <br> >>> <br> >>> On 3/18/24 3:42 PM, Drew Gallatin wrote:<br> >>>> No.=C2=A0 The goal is to run on every return to userspace = for every thread.<br> >>>> <br> >>>> Drew<br> >>>> <br> >>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrot= e:<br> >>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallati= n wrote:<br> >>>>>> I got the idea from<br> >>>>>> <a href=3D"https://people.mpi-sws.org/~druschel/pu= blications/soft-timers-tocs.pdf" rel=3D"noreferrer" target=3D"_blank">https= ://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf</a><br> >>>>>> The gist is that the TCP pacing stuff needs to run= frequently, and<br> >>>>>> rather than run it out of a clock interrupt, its m= ore efficient to run<br> >>>>>> it out of a system call context at just the point = where we return to<br> >>>>>> userspace and the cache is trashed anyway. The cur= rent implementation<br> >>>>>> is fine for our workload, but probably not idea fo= r a generic system.<br> >>>>>> Especially one where something is banging on syste= m calls.<br> >>>>>> <br> >>>>>> Ast's could be the right tool for this, but I&= #39;m super unfamiliar with<br> >>>>>> them, and I can't find any docs on them.<br> >>>>>> <br> >>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be rou= ghly equivalent to<br> >>>>>> what's happening here?<br> >>>>> This call would need some AST number added, and then i= t registers the<br> >>>>> ast to run on next return to userspace, for the curren= t thread.<br> >>>>> <br> >>>>> Is it enough?<br> >>>>>> <br> >>>>>> Drew<br> >>>>> <br> >>>>>> <br> >>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belou= sov wrote:<br> >>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike= Karels wrote:<br> >>>>>>>> On 18 Mar 2024, at 7:04, <a href=3D"mailto= :tuexen@freebsd.org" target=3D"_blank">tuexen@freebsd.org</a> wrote:<br> >>>>>>>> <br> >>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Te= ixeira<br> >>>>> <<a href=3D"mailto:eduardo@freebsd.org" target=3D"_= blank">eduardo@freebsd.org</a>> wrote:<br> >>>>>>>>>> <br> >>>>>>>>>> Hello all!<br> >>>>>>>>>> <br> >>>>>>>>>> It works just fine!<br> >>>>>>>>>> System performance is OK.<br> >>>>>>>>>> Using patch on main-n268841-b0aaf8= beb126(-dirty).<br> >>>>>>>>>> <br> >>>>>>>>>> ---<br> >>>>>>>>>> net.inet.tcp.functions_available:<= br> >>>>>>>>>> Stack=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0D<br> >>>>> Alias=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PCB count<br> >>>>>>>>>> freebsd freebsd=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0= <br> >>>>>>>>>> rack=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *<br> >>>>> rack=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A038<br> >>>>>>>>>> ---<br> >>>>>>>>>> <br> >>>>>>>>>> It would be so nice that we can ha= ve a sysctl tunnable for<br> >>>>> this patch<br> >>>>>>>>>> so we could do more tests without = recompiling kernel.<br> >>>>>>>>> Thanks for testing!<br> >>>>>>>>> <br> >>>>>>>>> @gallatin: can you come up with a patc= h that is acceptable<br> >>>>> for Netflix<br> >>>>>>>>> and allows to mitigate the performance= regression.<br> >>>>>>>> <br> >>>>>>>> Ideally, tcphpts could enable this automat= ically when it<br> >>>>> starts to be<br> >>>>>>>> used (enough?), but a sysctl could select = auto/on/off.<br> >>>>>>> There is already a well-known mechanism to req= uest execution of the<br> >>>>>>> specific function on return to userspace, name= ly AST.=C2=A0 The difference<br> >>>>>>> with the current hack is that the execution is= requested for one<br> >>>>> callback<br> >>>>>>> in the context of the specific thread.<br> >>>>>>> <br> >>>>>>> Still, it might be worth a try to use it; what= is the reason to<br> >>>>> hit a thread<br> >>>>>>> that does not do networking, with TCP processi= ng?<br> >>>>>>> <br> >>>>>>>> <br> >>>>>>>> Mike<br> >>>>>>>> <br> >>>>>>>>> Best regards<br> >>>>>>>>> Michael<br> >>>>>>>>>> <br> >>>>>>>>>> Thanks all!<br> >>>>>>>>>> Really happy here :)<br> >>>>>>>>>> <br> >>>>>>>>>> Cheers,<br> >>>>>>>>>> <br> >>>>>>>>>> Nuno Teixeira <<a href=3D"mailt= o:eduardo@freebsd.org" target=3D"_blank">eduardo@freebsd.org</a>> escrev= eu (domingo,<br> >>>>> 17/03/2024 =C3=A0(s) 20:26):<br> >>>>>>>>>>> <br> >>>>>>>>>>> Hello,<br> >>>>>>>>>>> <br> >>>>>>>>>>>> I don't have the full = context, but it seems like the<br> >>>>> complaint is a performance regression in bonnie++ and = perhaps other<br> >>>>> things when tcp_hpts is loaded, even when it is not us= ed.=C2=A0 Is that<br> >>>>> correct?<br> >>>>>>>>>>>> <br> >>>>>>>>>>>> If so, I suspect its becau= se we drive the<br> >>>>> tcp_hpts_softclock() routine from userret(), in order = to avoid tons<br> >>>>> of timer interrupts and context switches.=C2=A0 To tes= t this theory,=C2=A0 you<br> >>>>> could apply a patch like:<br> >>>>>>>>>>> <br> >>>>>>>>>>> It's affecting overall sys= tem performance, bonnie was just<br> >>>>> a way to<br> >>>>>>>>>>> get some numbers to compare.<b= r> >>>>>>>>>>> <br> >>>>>>>>>>> Tomorrow I will test patch.<br= > >>>>>>>>>>> <br> >>>>>>>>>>> Thanks!<br> >>>>>>>>>>> <br> >>>>>>>>>>> --<br> >>>>>>>>>>> Nuno Teixeira<br> >>>>>>>>>>> FreeBSD Committer (ports)<br> >>>>>>>>>> <br> >>>>>>>>>> <br> >>>>>>>>>> <br> >>>>>>>>>> --<br> >>>>>>>>>> Nuno Teixeira<br> >>>>>>>>>> FreeBSD Committer (ports)<br> >>>>>>>> <br> >>>>>>> <br> >>>>> <br> >>>> <br> >> <br> >>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c<b= r> >>> index 8c4d2d41a3eb..eadbee19f69c 100644<br> >>> --- a/sys/netinet/tcp_hpts.c<br> >>> +++ b/sys/netinet/tcp_hpts.c<br> >>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {<br> >>> void *ie_cookie;<br> >>> uint16_t p_num; /* The hpts number one per cpu */<br> >>> uint16_t p_cpu; /* The hpts CPU */<br> >>> + uint8_t hit_callout_thresh;<br> >>> /* There is extra space in here */<br> >>> /* Cache line 0x100 */<br> >>> struct callout co __aligned(CACHE_LINE_SIZE);<br> >>> @@ -269,6 +270,11 @@ static struct hpts_domain_info {<br> >>> int cpu[MAXCPU];<br> >>> } hpts_domains[MAXMEMDOM];<br> >>> <br> >>> +counter_u64_t hpts_that_need_softclock;<br> >>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needso= ftclock, CTLFLAG_RD,<br> >>> +=C2=A0 =C2=A0 &hpts_that_need_softclock,<br> >>> +=C2=A0 =C2=A0 "Number of hpts threads that need softcloc= k");<br> >>> +<br> >>> counter_u64_t hpts_hopelessly_behind;<br> >>> <br> >>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeles= s, CTLFLAG_RD,<br> >>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, p= recision, CTLFLAG_RW,<br> >>>=C2=A0 =C2=A0 =C2=A0&tcp_hpts_precision, 120,<br> >>>=C2=A0 =C2=A0 =C2=A0"Value for PRE() precision of callout&= quot;);<br> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_R= W,<br> >>> -=C2=A0 =C2=A0 &conn_cnt_thresh, 0,<br> >>> +=C2=A0 =C2=A0 &conn_cnt_thresh, DEFAULT_CONNECTION_THESHO= LD,<br> >>>=C2=A0 =C2=A0 =C2=A0"How many connections (below) make us = use the callout based mechanism");<br> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW,<= br> >>>=C2=A0 =C2=A0 =C2=A0&hpts_does_tp_logging, 0,<br> >>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)<br> >>> struct tcp_hpts_entry *hpts;<br> >>> int ticks_ran;<br> >>> <br> >>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0)<br= > >>> + return;<br> >>> +<br> >>> hpts =3D tcp_choose_hpts_to_run();<br> >>> <br> >>> if (hpts->p_hpts_active) {<br> >>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)<br> >>> ticks_ran =3D tcp_hptsi(hpts, 1);<br> >>> tv.tv_sec =3D 0;<br> >>> tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLO= T;<br> >>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) &&= ; (hpts->hit_callout_thresh =3D=3D 0)) {<br> >>> + hpts->hit_callout_thresh =3D 1;<br> >>> + counter_u64_add(hpts_that_need_softclock, 1);<br> >>> + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh)= && (hpts->hit_callout_thresh =3D=3D 1)) {<br> >>> + hpts->hit_callout_thresh =3D 0;<br> >>> + counter_u64_add(hpts_that_need_softclock, -1);<br> >>> + }<br> >>> if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) {<br> >>> if(hpts->p_direct_wake =3D=3D 0) {<br> >>> /*<br> >>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)<br> >>> cpu_top =3D NULL;<br> >>> #endif<br> >>> tcp_pace.rp_num_hptss =3D ncpus;<br> >>> + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK);<br= > >>> hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK);<br> >>> hpts_loops =3D counter_u64_alloc(M_WAITOK);<br> >>> back_tosleep =3D counter_u64_alloc(M_WAITOK);<br> >>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)<br> >>> free(tcp_pace.grps, M_TCPHPTS);<br> >>> #endif<br> >>> <br> >>> + counter_u64_free(hpts_that_need_softclock);<br> >>> counter_u64_free(hpts_hopelessly_behind);<br> >>> counter_u64_free(hpts_loops);<br> >>> counter_u64_free(back_tosleep);<br> >> <br> >> <br> > <br> > <br> > <br> > -- <br> > Nuno Teixeira<br> > FreeBSD Committer (ports)<br> <br> </blockquote></div><br clear=3D"all"><br><span class=3D"gmail_signature_pre= fix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"l= tr"><span style=3D"color:rgb(102,102,102)">Nuno Teixeira<br>FreeBSD Committ= er (ports)</span></div></div> --0000000000004c6d020615bc81cf--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFDf7U%2BPtNhVsbR=pMhGVnxW5thWmzYBq9%2Bd4rSbZD0zRnJjkg>