Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Mar 2024 15:13:11 -0400
From:      "Drew Gallatin" <gallatin@freebsd.org>
To:        "Konstantin Belousov" <kostikbel@gmail.com>, "Mike Karels" <mike@karels.net>
Cc:        tuexen <tuexen@freebsd.org>, "Nuno Teixeira" <eduardo@freebsd.org>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, "Randall Stewart" <rrs@freebsd.org>
Subject:   Re: Request for Testing: TCP RACK
Message-ID:  <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com>
In-Reply-To: <ZfiI7GcbTwSG8kkO@kib.kiev.ua>
References:   <CAFDf7ULtN9owoH-ns2OfR6ZhypNGxuNzkQbb2P9zR8ceFgaj5A@mail.gmail.com> <4FF534F6-B35D-4596-8D1E-226AD1347AC8@freebsd.org> <CAFDf7U%2BAjfeY%2Bqjq%2B-R71w5i1pRoxQdOmqJ9w4s1U13AA8-duA@mail.gmail.com> <C5D50314-4B0C-42F6-AA67-B5A32A4BA335@freebsd.org> <CAFDf7UKL6vtKo1Mn9Vw_5OD9Xubuw%2BdgS83WKwsiTUaXHs8D6Q@mail.gmail.com> <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
--01b96c257b37417295d61c17eb06343b
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable

I got the idea from https://people.mpi-sws.org/~druschel/publications/so=
ft-timers-tocs.pdf  The gist is that the TCP pacing stuff needs to run f=
requently, and rather than run it out of a clock interrupt, its more eff=
icient to run it out of a system call context at just the point where we=
 return to userspace and the cache is trashed anyway.   The current impl=
ementation is fine for our workload, but probably not idea for a generic=
 system.  Especially one where something is banging on system calls. =20

Ast's could be the right tool for this, but I'm super unfamiliar with th=
em, and I can't find any docs on them.=20

Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to wha=
t's happening here?

Drew

On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote:
> >=20
> > >> On 18. Mar 2024, at 12:42, Nuno Teixeira <eduardo@freebsd.org> wr=
ote:
> > >>
> > >> Hello all!
> > >>
> > >> It works just fine!
> > >> System performance is OK.
> > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > >>
> > >> ---
> > >> net.inet.tcp.functions_available:
> > >> Stack                           D Alias                          =
  PCB count
> > >> freebsd                           freebsd                        =
  0
> > >> rack                            * rack                           =
  38
> > >> ---
> > >>
> > >> It would be so nice that we can have a sysctl tunnable for this p=
atch
> > >> so we could do more tests without recompiling kernel.
> > > Thanks for testing!
> > >
> > > @gallatin: can you come up with a patch that is acceptable for Net=
flix
> > > and allows to mitigate the performance regression.
> >=20
> > Ideally, tcphpts could enable this automatically when it starts to be
> > used (enough?), but a sysctl could select auto/on/off.
> There is already a well-known mechanism to request execution of the
> specific function on return to userspace, namely AST.  The difference
> with the current hack is that the execution is requested for one callb=
ack
> in the context of the specific thread.
>=20
> Still, it might be worth a try to use it; what is the reason to hit a =
thread
> that does not do networking, with TCP processing?
>=20
> >=20
> > Mike
> >=20
> > > Best regards
> > > Michael
> > >>
> > >> Thanks all!
> > >> Really happy here :)
> > >>
> > >> Cheers,
> > >>
> > >> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, 17/03/2024=
 =C3=A0(s) 20:26):
> > >>>
> > >>> Hello,
> > >>>
> > >>>> I don't have the full context, but it seems like the complaint =
is a performance regression in bonnie++ and perhaps other things when tc=
p_hpts is loaded, even when it is not used.  Is that correct?
> > >>>>
> > >>>> If so, I suspect its because we drive the tcp_hpts_softclock() =
routine from userret(), in order to avoid tons of timer interrupts and c=
ontext switches.  To test this theory,  you could apply a patch like:
> > >>>
> > >>> It's affecting overall system performance, bonnie was just a way=
 to
> > >>> get some numbers to compare.
> > >>>
> > >>> Tomorrow I will test patch.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> --
> > >>> Nuno Teixeira
> > >>> FreeBSD Committer (ports)
> > >>
> > >>
> > >>
> > >> --=20
> > >> Nuno Teixeira
> > >> FreeBSD Committer (ports)
> >=20
>=20

--01b96c257b37417295d61c17eb06343b
Content-Type: text/html;charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso=
Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>I got the idea =
from&nbsp;<a href=3D"https://people.mpi-sws.org/~druschel/publications/s=
oft-timers-tocs.pdf">https://people.mpi-sws.org/~druschel/publications/s=
oft-timers-tocs.pdf</a>&nbsp; The gist is that the TCP pacing stuff need=
s to run frequently, and rather than run it out of a clock interrupt, it=
s more efficient to run it out of a system call context at just the poin=
t where we return to userspace and the cache is trashed anyway.&nbsp;&nb=
sp; The current implementation is fine for our workload, but probably no=
t idea for a generic system.&nbsp; Especially one where something is ban=
ging on system calls.&nbsp; <br></div><div><br></div><div>Ast's could be=
 the right tool for this, but I'm super unfamiliar with them, and I can'=
t find any docs on them. <br></div><div><br></div><div>Would ast_registe=
r(0, ASTR_UNCOND, 0, func) be roughly equivalent to what's happening her=
e?<br></div><div><br></div><div>Drew<br></div><div><br></div><div>On Mon=
, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:<br></div><blockqu=
ote type=3D"cite" id=3D"qt" style=3D""><div>On Mon, Mar 18, 2024 at 07:2=
6:10AM -0500, Mike Karels wrote:<br></div><div>&gt; On 18 Mar 2024, at 7=
:04,&nbsp;<a href=3D"mailto:tuexen@freebsd.org">tuexen@freebsd.org</a> w=
rote:<br></div><div>&gt;&nbsp;<br></div><div>&gt; &gt;&gt; On 18. Mar 20=
24, at 12:42, Nuno Teixeira &lt;<a href=3D"mailto:eduardo@freebsd.org">e=
duardo@freebsd.org</a>&gt; wrote:<br></div><div>&gt; &gt;&gt;<br></div><=
div>&gt; &gt;&gt; Hello all!<br></div><div>&gt; &gt;&gt;<br></div><div>&=
gt; &gt;&gt; It works just fine!<br></div><div>&gt; &gt;&gt; System perf=
ormance is OK.<br></div><div>&gt; &gt;&gt; Using patch on main-n268841-b=
0aaf8beb126(-dirty).<br></div><div>&gt; &gt;&gt;<br></div><div>&gt; &gt;=
&gt; ---<br></div><div>&gt; &gt;&gt; net.inet.tcp.functions_available:<b=
r></div><div>&gt; &gt;&gt; Stack&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; D Alias&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; P=
CB count<br></div><div>&gt; &gt;&gt; freebsd&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; freebsd&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0=
<br></div><div>&gt; &gt;&gt; rack&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * rack&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp; 38<br></div><div>&gt; &gt;&gt; ---<br></div><div>&gt; &gt;&gt=
;<br></div><div>&gt; &gt;&gt; It would be so nice that we can have a sys=
ctl tunnable for this patch<br></div><div>&gt; &gt;&gt; so we could do m=
ore tests without recompiling kernel.<br></div><div>&gt; &gt; Thanks for=
 testing!<br></div><div>&gt; &gt;<br></div><div>&gt; &gt; @gallatin: can=
 you come up with a patch that is acceptable for Netflix<br></div><div>&=
gt; &gt; and allows to mitigate the performance regression.<br></div><di=
v>&gt;&nbsp;<br></div><div>&gt; Ideally, tcphpts could enable this autom=
atically when it starts to be<br></div><div>&gt; used (enough?), but a s=
ysctl could select auto/on/off.<br></div><div>There is already a well-kn=
own mechanism to request execution of the<br></div><div>specific functio=
n on return to userspace, namely AST.&nbsp; The difference<br></div><div=
>with the current hack is that the execution is requested for one callba=
ck<br></div><div>in the context of the specific thread.<br></div><div><b=
r></div><div>Still, it might be worth a try to use it; what is the reaso=
n to hit a thread<br></div><div>that does not do networking, with TCP pr=
ocessing?<br></div><div><br></div><div>&gt;&nbsp;<br></div><div>&gt; 		M=
ike<br></div><div>&gt;&nbsp;<br></div><div>&gt; &gt; Best regards<br></d=
iv><div>&gt; &gt; Michael<br></div><div>&gt; &gt;&gt;<br></div><div>&gt;=
 &gt;&gt; Thanks all!<br></div><div>&gt; &gt;&gt; Really happy here :)<b=
r></div><div>&gt; &gt;&gt;<br></div><div>&gt; &gt;&gt; Cheers,<br></div>=
<div>&gt; &gt;&gt;<br></div><div>&gt; &gt;&gt; Nuno Teixeira &lt;<a href=
=3D"mailto:eduardo@freebsd.org">eduardo@freebsd.org</a>&gt; escreveu (do=
mingo, 17/03/2024 =C3=A0(s) 20:26):<br></div><div>&gt; &gt;&gt;&gt;<br><=
/div><div>&gt; &gt;&gt;&gt; Hello,<br></div><div>&gt; &gt;&gt;&gt;<br></=
div><div>&gt; &gt;&gt;&gt;&gt; I don't have the full context, but it see=
ms like the complaint is a performance regression in bonnie++ and perhap=
s other things when tcp_hpts is loaded, even when it is not used.&nbsp; =
Is that correct?<br></div><div>&gt; &gt;&gt;&gt;&gt;<br></div><div>&gt; =
&gt;&gt;&gt;&gt; If so, I suspect its because we drive the tcp_hpts_soft=
clock() routine from userret(), in order to avoid tons of timer interrup=
ts and context switches.&nbsp; To test this theory,&nbsp; you could appl=
y a patch like:<br></div><div>&gt; &gt;&gt;&gt;<br></div><div>&gt; &gt;&=
gt;&gt; It's affecting overall system performance, bonnie was just a way=
 to<br></div><div>&gt; &gt;&gt;&gt; get some numbers to compare.<br></di=
v><div>&gt; &gt;&gt;&gt;<br></div><div>&gt; &gt;&gt;&gt; Tomorrow I will=
 test patch.<br></div><div>&gt; &gt;&gt;&gt;<br></div><div>&gt; &gt;&gt;=
&gt; Thanks!<br></div><div>&gt; &gt;&gt;&gt;<br></div><div>&gt; &gt;&gt;=
&gt; --<br></div><div>&gt; &gt;&gt;&gt; Nuno Teixeira<br></div><div>&gt;=
 &gt;&gt;&gt; FreeBSD Committer (ports)<br></div><div>&gt; &gt;&gt;<br><=
/div><div>&gt; &gt;&gt;<br></div><div>&gt; &gt;&gt;<br></div><div>&gt; &=
gt;&gt; --&nbsp;<br></div><div>&gt; &gt;&gt; Nuno Teixeira<br></div><div=
>&gt; &gt;&gt; FreeBSD Committer (ports)<br></div><div>&gt;&nbsp;<br></d=
iv><div><br></div></blockquote><div><br></div></body></html>
--01b96c257b37417295d61c17eb06343b--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8031cd99-ded8-4b06-93b3-11cc729a8b2c>