Date: Mon, 18 Mar 2024 15:42:42 -0400 From: "Drew Gallatin" <gallatin@freebsd.org> To: "Konstantin Belousov" <kostikbel@gmail.com> Cc: "Mike Karels" <mike@karels.net>, tuexen <tuexen@freebsd.org>, "Nuno Teixeira" <eduardo@freebsd.org>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, "Randall Stewart" <rrs@freebsd.org> Subject: Re: Request for Testing: TCP RACK Message-ID: <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> In-Reply-To: <ZfiY-xUUM3wrBEz_@kib.kiev.ua> References: <CAFDf7U%2BAjfeY%2Bqjq%2B-R71w5i1pRoxQdOmqJ9w4s1U13AA8-duA@mail.gmail.com> <C5D50314-4B0C-42F6-AA67-B5A32A4BA335@freebsd.org> <CAFDf7UKL6vtKo1Mn9Vw_5OD9Xubuw%2BdgS83WKwsiTUaXHs8D6Q@mail.gmail.com> <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <ZfiY-xUUM3wrBEz_@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--49341a7599d5444a8bbcc6b7abbe0677 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable No. The goal is to run on every return to userspace for every thread. Drew On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > > I got the idea from > > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.p= df > > The gist is that the TCP pacing stuff needs to run frequently, and > > rather than run it out of a clock interrupt, its more efficient to r= un > > it out of a system call context at just the point where we return to > > userspace and the cache is trashed anyway. The current implementation > > is fine for our workload, but probably not idea for a generic system. > > Especially one where something is banging on system calls. > > > > Ast's could be the right tool for this, but I'm super unfamiliar with > > them, and I can't find any docs on them. > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to > > what's happening here? > This call would need some AST number added, and then it registers the > ast to run on next return to userspace, for the current thread. >=20 > Is it enough? > > > > Drew >=20 > >=20 > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > > > >=20 > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira <eduardo@freebsd.org= > wrote: > > > > >> > > > > >> Hello all! > > > > >> > > > > >> It works just fine! > > > > >> System performance is OK. > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > > > >> > > > > >> --- > > > > >> net.inet.tcp.functions_available: > > > > >> Stack D Alias = PCB count > > > > >> freebsd freebsd = 0 > > > > >> rack * rack = 38 > > > > >> --- > > > > >> > > > > >> It would be so nice that we can have a sysctl tunnable for th= is patch > > > > >> so we could do more tests without recompiling kernel. > > > > > Thanks for testing! > > > > > > > > > > @gallatin: can you come up with a patch that is acceptable for= Netflix > > > > > and allows to mitigate the performance regression. > > > >=20 > > > > Ideally, tcphpts could enable this automatically when it starts = to be > > > > used (enough?), but a sysctl could select auto/on/off. > > > There is already a well-known mechanism to request execution of the > > > specific function on return to userspace, namely AST. The differe= nce > > > with the current hack is that the execution is requested for one c= allback > > > in the context of the specific thread. > > >=20 > > > Still, it might be worth a try to use it; what is the reason to hi= t a thread > > > that does not do networking, with TCP processing? > > >=20 > > > >=20 > > > > Mike > > > >=20 > > > > > Best regards > > > > > Michael > > > > >> > > > > >> Thanks all! > > > > >> Really happy here :) > > > > >> > > > > >> Cheers, > > > > >> > > > > >> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, 17/03/= 2024 =C3=A0(s) 20:26): > > > > >>> > > > > >>> Hello, > > > > >>> > > > > >>>> I don't have the full context, but it seems like the compla= int is a performance regression in bonnie++ and perhaps other things whe= n tcp_hpts is loaded, even when it is not used. Is that correct? > > > > >>>> > > > > >>>> If so, I suspect its because we drive the tcp_hpts_softcloc= k() routine from userret(), in order to avoid tons of timer interrupts a= nd context switches. To test this theory, you could apply a patch like: > > > > >>> > > > > >>> It's affecting overall system performance, bonnie was just a= way to > > > > >>> get some numbers to compare. > > > > >>> > > > > >>> Tomorrow I will test patch. > > > > >>> > > > > >>> Thanks! > > > > >>> > > > > >>> -- > > > > >>> Nuno Teixeira > > > > >>> FreeBSD Committer (ports) > > > > >> > > > > >> > > > > >> > > > > >> --=20 > > > > >> Nuno Teixeira > > > > >> FreeBSD Committer (ports) > > > >=20 > > >=20 >=20 --49341a7599d5444a8bbcc6b7abbe0677 Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable <!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso= Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>No. The g= oal is to run on every return to userspace for every thread.<br></div><d= iv><br></div><div>Drew<br></div><div><br></div><div>On Mon, Mar 18, 2024= , at 3:41 PM, Konstantin Belousov wrote:<br></div><blockquote type=3D"ci= te" id=3D"qt" style=3D""><div>On Mon, Mar 18, 2024 at 03:13:11PM -0400, = Drew Gallatin wrote:<br></div><div>> I got the idea from<br></div><di= v>> <a href=3D"https://people.mpi-sws.org/~druschel/publications= /soft-timers-tocs.pdf">https://people.mpi-sws.org/~druschel/publications= /soft-timers-tocs.pdf</a><br></div><div>> The gist is that the TCP pa= cing stuff needs to run frequently, and<br></div><div>> rather than r= un it out of a clock interrupt, its more efficient to run<br></div><div>= > it out of a system call context at just the point where we return t= o<br></div><div>> userspace and the cache is trashed anyway. The curr= ent implementation<br></div><div>> is fine for our workload, but prob= ably not idea for a generic system.<br></div><div>> Especially one wh= ere something is banging on system calls.<br></div><div>><br></div><d= iv>> Ast's could be the right tool for this, but I'm super unfamiliar= with<br></div><div>> them, and I can't find any docs on them.<br></d= iv><div>><br></div><div>> Would ast_register(0, ASTR_UNCOND, 0, fu= nc) be roughly equivalent to<br></div><div>> what's happening here?<b= r></div><div>This call would need some AST number added, and then it reg= isters the<br></div><div>ast to run on next return to userspace, for the= current thread.<br></div><div><br></div><div>Is it enough?<br></div><di= v>><br></div><div>> Drew<br></div><div><br></div><div>> <b= r></div><div>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov = wrote:<br></div><div>> > On Mon, Mar 18, 2024 at 07:26:10AM -0500,= Mike Karels wrote:<br></div><div>> > > On 18 Mar 2024, at 7:04= , <a href=3D"mailto:tuexen@freebsd.org">tuexen@freebsd.org</a> wrot= e:<br></div><div>> > > <br></div><div>> > > >&= gt; On 18. Mar 2024, at 12:42, Nuno Teixeira <<a href=3D"mailto:eduar= do@freebsd.org">eduardo@freebsd.org</a>> wrote:<br></div><div>> &g= t; > >><br></div><div>> > > >> Hello all!<br></d= iv><div>> > > >><br></div><div>> > > >> It= works just fine!<br></div><div>> > > >> System performan= ce is OK.<br></div><div>> > > >> Using patch on main-n268= 841-b0aaf8beb126(-dirty).<br></div><div>> > > >><br></div= ><div>> > > >> ---<br></div><div>> > > >> = net.inet.tcp.functions_available:<br></div><div>> > > >> = Stack &= nbsp; &= nbsp; D Alias  = ;  = ; PCB count<br></div><div>>= > > >> freebsd &nb= sp; &nb= sp; freebsd &= nbsp; &= nbsp; 0<br></div><div>&g= t; > > >> rack &nbs= p; &nbs= p; * rack &nb= sp; &nb= sp; &nb= sp; 38<br></div><div>> > > >> ---<br></div><div>> >= > >><br></div><div>> > > >> It would be so nice= that we can have a sysctl tunnable for this patch<br></div><div>> &g= t; > >> so we could do more tests without recompiling kernel.<b= r></div><div>> > > > Thanks for testing!<br></div><div>> = > > ><br></div><div>> > > > @gallatin: can you come= up with a patch that is acceptable for Netflix<br></div><div>> > = > > and allows to mitigate the performance regression.<br></div><d= iv>> > > <br></div><div>> > > Ideally, tcphpts co= uld enable this automatically when it starts to be<br></div><div>> &g= t; > used (enough?), but a sysctl could select auto/on/off.<br></div>= <div>> > There is already a well-known mechanism to request execut= ion of the<br></div><div>> > specific function on return to usersp= ace, namely AST. The difference<br></div><div>> > with the c= urrent hack is that the execution is requested for one callback<br></div= ><div>> > in the context of the specific thread.<br></div><div>>= ; > <br></div><div>> > Still, it might be worth a try to u= se it; what is the reason to hit a thread<br></div><div>> > that d= oes not do networking, with TCP processing?<br></div><div>> > = ;<br></div><div>> > > <br></div><div>> > > Mike<b= r></div><div>> > > <br></div><div>> > > > Best= regards<br></div><div>> > > > Michael<br></div><div>> &g= t; > >><br></div><div>> > > >> Thanks all!<br></= div><div>> > > >> Really happy here :)<br></div><div>>= > > >><br></div><div>> > > >> Cheers,<br></d= iv><div>> > > >><br></div><div>> > > >> Nu= no Teixeira <<a href=3D"mailto:eduardo@freebsd.org">eduardo@freebsd.o= rg</a>> escreveu (domingo, 17/03/2024 =C3=A0(s) 20:26):<br></div><div= >> > > >>><br></div><div>> > > >>> H= ello,<br></div><div>> > > >>><br></div><div>> > = > >>>> I don't have the full context, but it seems like t= he complaint is a performance regression in bonnie++ and perhaps other t= hings when tcp_hpts is loaded, even when it is not used. Is that c= orrect?<br></div><div>> > > >>>><br></div><div>>= > > >>>> If so, I suspect its because we drive the tc= p_hpts_softclock() routine from userret(), in order to avoid tons of tim= er interrupts and context switches. To test this theory, you= could apply a patch like:<br></div><div>> > > >>><br>= </div><div>> > > >>> It's affecting overall system per= formance, bonnie was just a way to<br></div><div>> > > >>= > get some numbers to compare.<br></div><div>> > > >>&= gt;<br></div><div>> > > >>> Tomorrow I will test patch= .<br></div><div>> > > >>><br></div><div>> > >= >>> Thanks!<br></div><div>> > > >>><br></div= ><div>> > > >>> --<br></div><div>> > > >&g= t;> Nuno Teixeira<br></div><div>> > > >>> FreeBSD C= ommitter (ports)<br></div><div>> > > >><br></div><div>>= ; > > >><br></div><div>> > > >><br></div><div= >> > > >> -- <br></div><div>> > > >> = Nuno Teixeira<br></div><div>> > > >> FreeBSD Committer (p= orts)<br></div><div>> > > <br></div><div>> > <b= r></div><div><br></div></blockquote><div><br></div></body></html> --49341a7599d5444a8bbcc6b7abbe0677--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38c54399-6c96-44d8-a3a2-3cc1bfbe50c2>