Date: Mon, 18 Mar 2024 15:13:11 -0400 From: "Drew Gallatin" <gallatin@freebsd.org> To: "Konstantin Belousov" <kostikbel@gmail.com>, "Mike Karels" <mike@karels.net> Cc: tuexen <tuexen@freebsd.org>, "Nuno Teixeira" <eduardo@freebsd.org>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, "Randall Stewart" <rrs@freebsd.org> Subject: Re: Request for Testing: TCP RACK Message-ID: <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> In-Reply-To: <ZfiI7GcbTwSG8kkO@kib.kiev.ua> References: <CAFDf7ULtN9owoH-ns2OfR6ZhypNGxuNzkQbb2P9zR8ceFgaj5A@mail.gmail.com> <4FF534F6-B35D-4596-8D1E-226AD1347AC8@freebsd.org> <CAFDf7U%2BAjfeY%2Bqjq%2B-R71w5i1pRoxQdOmqJ9w4s1U13AA8-duA@mail.gmail.com> <C5D50314-4B0C-42F6-AA67-B5A32A4BA335@freebsd.org> <CAFDf7UKL6vtKo1Mn9Vw_5OD9Xubuw%2BdgS83WKwsiTUaXHs8D6Q@mail.gmail.com> <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--01b96c257b37417295d61c17eb06343b Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable I got the idea from https://people.mpi-sws.org/~druschel/publications/so= ft-timers-tocs.pdf The gist is that the TCP pacing stuff needs to run f= requently, and rather than run it out of a clock interrupt, its more eff= icient to run it out of a system call context at just the point where we= return to userspace and the cache is trashed anyway. The current impl= ementation is fine for our workload, but probably not idea for a generic= system. Especially one where something is banging on system calls. =20 Ast's could be the right tool for this, but I'm super unfamiliar with th= em, and I can't find any docs on them.=20 Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to wha= t's happening here? Drew On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > >=20 > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira <eduardo@freebsd.org> wr= ote: > > >> > > >> Hello all! > > >> > > >> It works just fine! > > >> System performance is OK. > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > >> > > >> --- > > >> net.inet.tcp.functions_available: > > >> Stack D Alias = PCB count > > >> freebsd freebsd = 0 > > >> rack * rack = 38 > > >> --- > > >> > > >> It would be so nice that we can have a sysctl tunnable for this p= atch > > >> so we could do more tests without recompiling kernel. > > > Thanks for testing! > > > > > > @gallatin: can you come up with a patch that is acceptable for Net= flix > > > and allows to mitigate the performance regression. > >=20 > > Ideally, tcphpts could enable this automatically when it starts to be > > used (enough?), but a sysctl could select auto/on/off. > There is already a well-known mechanism to request execution of the > specific function on return to userspace, namely AST. The difference > with the current hack is that the execution is requested for one callb= ack > in the context of the specific thread. >=20 > Still, it might be worth a try to use it; what is the reason to hit a = thread > that does not do networking, with TCP processing? >=20 > >=20 > > Mike > >=20 > > > Best regards > > > Michael > > >> > > >> Thanks all! > > >> Really happy here :) > > >> > > >> Cheers, > > >> > > >> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, 17/03/2024= =C3=A0(s) 20:26): > > >>> > > >>> Hello, > > >>> > > >>>> I don't have the full context, but it seems like the complaint = is a performance regression in bonnie++ and perhaps other things when tc= p_hpts is loaded, even when it is not used. Is that correct? > > >>>> > > >>>> If so, I suspect its because we drive the tcp_hpts_softclock() = routine from userret(), in order to avoid tons of timer interrupts and c= ontext switches. To test this theory, you could apply a patch like: > > >>> > > >>> It's affecting overall system performance, bonnie was just a way= to > > >>> get some numbers to compare. > > >>> > > >>> Tomorrow I will test patch. > > >>> > > >>> Thanks! > > >>> > > >>> -- > > >>> Nuno Teixeira > > >>> FreeBSD Committer (ports) > > >> > > >> > > >> > > >> --=20 > > >> Nuno Teixeira > > >> FreeBSD Committer (ports) > >=20 >=20 --01b96c257b37417295d61c17eb06343b Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable <!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso= Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>I got the idea = from <a href=3D"https://people.mpi-sws.org/~druschel/publications/s= oft-timers-tocs.pdf">https://people.mpi-sws.org/~druschel/publications/s= oft-timers-tocs.pdf</a> The gist is that the TCP pacing stuff need= s to run frequently, and rather than run it out of a clock interrupt, it= s more efficient to run it out of a system call context at just the poin= t where we return to userspace and the cache is trashed anyway. &nb= sp; The current implementation is fine for our workload, but probably no= t idea for a generic system. Especially one where something is ban= ging on system calls. <br></div><div><br></div><div>Ast's could be= the right tool for this, but I'm super unfamiliar with them, and I can'= t find any docs on them. <br></div><div><br></div><div>Would ast_registe= r(0, ASTR_UNCOND, 0, func) be roughly equivalent to what's happening her= e?<br></div><div><br></div><div>Drew<br></div><div><br></div><div>On Mon= , Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:<br></div><blockqu= ote type=3D"cite" id=3D"qt" style=3D""><div>On Mon, Mar 18, 2024 at 07:2= 6:10AM -0500, Mike Karels wrote:<br></div><div>> On 18 Mar 2024, at 7= :04, <a href=3D"mailto:tuexen@freebsd.org">tuexen@freebsd.org</a> w= rote:<br></div><div>> <br></div><div>> >> On 18. Mar 20= 24, at 12:42, Nuno Teixeira <<a href=3D"mailto:eduardo@freebsd.org">e= duardo@freebsd.org</a>> wrote:<br></div><div>> >><br></div><= div>> >> Hello all!<br></div><div>> >><br></div><div>&= gt; >> It works just fine!<br></div><div>> >> System perf= ormance is OK.<br></div><div>> >> Using patch on main-n268841-b= 0aaf8beb126(-dirty).<br></div><div>> >><br></div><div>> >= > ---<br></div><div>> >> net.inet.tcp.functions_available:<b= r></div><div>> >> Stack &nbs= p; &nbs= p; D Alias &n= bsp; &n= bsp; P= CB count<br></div><div>> >> freebsd &nbs= p; &nbs= p; freebsd &n= bsp; &n= bsp; 0= <br></div><div>> >> rack &nb= sp; &nb= sp; * rack &n= bsp; &n= bsp; &n= bsp; 38<br></div><div>> >> ---<br></div><div>> >>= ;<br></div><div>> >> It would be so nice that we can have a sys= ctl tunnable for this patch<br></div><div>> >> so we could do m= ore tests without recompiling kernel.<br></div><div>> > Thanks for= testing!<br></div><div>> ><br></div><div>> > @gallatin: can= you come up with a patch that is acceptable for Netflix<br></div><div>&= gt; > and allows to mitigate the performance regression.<br></div><di= v>> <br></div><div>> Ideally, tcphpts could enable this autom= atically when it starts to be<br></div><div>> used (enough?), but a s= ysctl could select auto/on/off.<br></div><div>There is already a well-kn= own mechanism to request execution of the<br></div><div>specific functio= n on return to userspace, namely AST. The difference<br></div><div= >with the current hack is that the execution is requested for one callba= ck<br></div><div>in the context of the specific thread.<br></div><div><b= r></div><div>Still, it might be worth a try to use it; what is the reaso= n to hit a thread<br></div><div>that does not do networking, with TCP pr= ocessing?<br></div><div><br></div><div>> <br></div><div>> M= ike<br></div><div>> <br></div><div>> > Best regards<br></d= iv><div>> > Michael<br></div><div>> >><br></div><div>>= >> Thanks all!<br></div><div>> >> Really happy here :)<b= r></div><div>> >><br></div><div>> >> Cheers,<br></div>= <div>> >><br></div><div>> >> Nuno Teixeira <<a href= =3D"mailto:eduardo@freebsd.org">eduardo@freebsd.org</a>> escreveu (do= mingo, 17/03/2024 =C3=A0(s) 20:26):<br></div><div>> >>><br><= /div><div>> >>> Hello,<br></div><div>> >>><br></= div><div>> >>>> I don't have the full context, but it see= ms like the complaint is a performance regression in bonnie++ and perhap= s other things when tcp_hpts is loaded, even when it is not used. = Is that correct?<br></div><div>> >>>><br></div><div>> = >>>> If so, I suspect its because we drive the tcp_hpts_soft= clock() routine from userret(), in order to avoid tons of timer interrup= ts and context switches. To test this theory, you could appl= y a patch like:<br></div><div>> >>><br></div><div>> >&= gt;> It's affecting overall system performance, bonnie was just a way= to<br></div><div>> >>> get some numbers to compare.<br></di= v><div>> >>><br></div><div>> >>> Tomorrow I will= test patch.<br></div><div>> >>><br></div><div>> >>= > Thanks!<br></div><div>> >>><br></div><div>> >>= > --<br></div><div>> >>> Nuno Teixeira<br></div><div>>= >>> FreeBSD Committer (ports)<br></div><div>> >><br><= /div><div>> >><br></div><div>> >><br></div><div>> &= gt;> -- <br></div><div>> >> Nuno Teixeira<br></div><div= >> >> FreeBSD Committer (ports)<br></div><div>> <br></d= iv><div><br></div></blockquote><div><br></div></body></html> --01b96c257b37417295d61c17eb06343b--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8031cd99-ded8-4b06-93b3-11cc729a8b2c>