Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Mar 2024 15:42:42 -0400
From:      "Drew Gallatin" <gallatin@freebsd.org>
To:        "Konstantin Belousov" <kostikbel@gmail.com>
Cc:        "Mike Karels" <mike@karels.net>, tuexen <tuexen@freebsd.org>, "Nuno Teixeira" <eduardo@freebsd.org>, garyj@gmx.de, current@freebsd.org, net@freebsd.org, "Randall Stewart" <rrs@freebsd.org>
Subject:   Re: Request for Testing: TCP RACK
Message-ID:  <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com>
In-Reply-To: <ZfiY-xUUM3wrBEz_@kib.kiev.ua>
References:   <CAFDf7U%2BAjfeY%2Bqjq%2B-R71w5i1pRoxQdOmqJ9w4s1U13AA8-duA@mail.gmail.com> <C5D50314-4B0C-42F6-AA67-B5A32A4BA335@freebsd.org> <CAFDf7UKL6vtKo1Mn9Vw_5OD9Xubuw%2BdgS83WKwsiTUaXHs8D6Q@mail.gmail.com> <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <ZfiY-xUUM3wrBEz_@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
--49341a7599d5444a8bbcc6b7abbe0677
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable

No.  The goal is to run on every return to userspace for every thread.

Drew

On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> > I got the idea from
> > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.p=
df
> > The gist is that the TCP pacing stuff needs to run frequently, and
> > rather than run it out of a clock interrupt, its more efficient to r=
un
> > it out of a system call context at just the point where we return to
> > userspace and the cache is trashed anyway. The current implementation
> > is fine for our workload, but probably not idea for a generic system.
> > Especially one where something is banging on system calls.
> >
> > Ast's could be the right tool for this, but I'm super unfamiliar with
> > them, and I can't find any docs on them.
> >
> > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> > what's happening here?
> This call would need some AST number added, and then it registers the
> ast to run on next return to userspace, for the current thread.
>=20
> Is it enough?
> >
> > Drew
>=20
> >=20
> > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote:
> > > >=20
> > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira <eduardo@freebsd.org=
> wrote:
> > > > >>
> > > > >> Hello all!
> > > > >>
> > > > >> It works just fine!
> > > > >> System performance is OK.
> > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > > > >>
> > > > >> ---
> > > > >> net.inet.tcp.functions_available:
> > > > >> Stack                           D Alias                      =
      PCB count
> > > > >> freebsd                           freebsd                    =
      0
> > > > >> rack                            * rack                       =
      38
> > > > >> ---
> > > > >>
> > > > >> It would be so nice that we can have a sysctl tunnable for th=
is patch
> > > > >> so we could do more tests without recompiling kernel.
> > > > > Thanks for testing!
> > > > >
> > > > > @gallatin: can you come up with a patch that is acceptable for=
 Netflix
> > > > > and allows to mitigate the performance regression.
> > > >=20
> > > > Ideally, tcphpts could enable this automatically when it starts =
to be
> > > > used (enough?), but a sysctl could select auto/on/off.
> > > There is already a well-known mechanism to request execution of the
> > > specific function on return to userspace, namely AST.  The differe=
nce
> > > with the current hack is that the execution is requested for one c=
allback
> > > in the context of the specific thread.
> > >=20
> > > Still, it might be worth a try to use it; what is the reason to hi=
t a thread
> > > that does not do networking, with TCP processing?
> > >=20
> > > >=20
> > > > Mike
> > > >=20
> > > > > Best regards
> > > > > Michael
> > > > >>
> > > > >> Thanks all!
> > > > >> Really happy here :)
> > > > >>
> > > > >> Cheers,
> > > > >>
> > > > >> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, 17/03/=
2024 =C3=A0(s) 20:26):
> > > > >>>
> > > > >>> Hello,
> > > > >>>
> > > > >>>> I don't have the full context, but it seems like the compla=
int is a performance regression in bonnie++ and perhaps other things whe=
n tcp_hpts is loaded, even when it is not used.  Is that correct?
> > > > >>>>
> > > > >>>> If so, I suspect its because we drive the tcp_hpts_softcloc=
k() routine from userret(), in order to avoid tons of timer interrupts a=
nd context switches.  To test this theory,  you could apply a patch like:
> > > > >>>
> > > > >>> It's affecting overall system performance, bonnie was just a=
 way to
> > > > >>> get some numbers to compare.
> > > > >>>
> > > > >>> Tomorrow I will test patch.
> > > > >>>
> > > > >>> Thanks!
> > > > >>>
> > > > >>> --
> > > > >>> Nuno Teixeira
> > > > >>> FreeBSD Committer (ports)
> > > > >>
> > > > >>
> > > > >>
> > > > >> --=20
> > > > >> Nuno Teixeira
> > > > >> FreeBSD Committer (ports)
> > > >=20
> > >=20
>=20

--49341a7599d5444a8bbcc6b7abbe0677
Content-Type: text/html;charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso=
Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>No.&nbsp; The g=
oal is to run on every return to userspace for every thread.<br></div><d=
iv><br></div><div>Drew<br></div><div><br></div><div>On Mon, Mar 18, 2024=
, at 3:41 PM, Konstantin Belousov wrote:<br></div><blockquote type=3D"ci=
te" id=3D"qt" style=3D""><div>On Mon, Mar 18, 2024 at 03:13:11PM -0400, =
Drew Gallatin wrote:<br></div><div>&gt; I got the idea from<br></div><di=
v>&gt;&nbsp;<a href=3D"https://people.mpi-sws.org/~druschel/publications=
/soft-timers-tocs.pdf">https://people.mpi-sws.org/~druschel/publications=
/soft-timers-tocs.pdf</a><br></div><div>&gt; The gist is that the TCP pa=
cing stuff needs to run frequently, and<br></div><div>&gt; rather than r=
un it out of a clock interrupt, its more efficient to run<br></div><div>=
&gt; it out of a system call context at just the point where we return t=
o<br></div><div>&gt; userspace and the cache is trashed anyway. The curr=
ent implementation<br></div><div>&gt; is fine for our workload, but prob=
ably not idea for a generic system.<br></div><div>&gt; Especially one wh=
ere something is banging on system calls.<br></div><div>&gt;<br></div><d=
iv>&gt; Ast's could be the right tool for this, but I'm super unfamiliar=
 with<br></div><div>&gt; them, and I can't find any docs on them.<br></d=
iv><div>&gt;<br></div><div>&gt; Would ast_register(0, ASTR_UNCOND, 0, fu=
nc) be roughly equivalent to<br></div><div>&gt; what's happening here?<b=
r></div><div>This call would need some AST number added, and then it reg=
isters the<br></div><div>ast to run on next return to userspace, for the=
 current thread.<br></div><div><br></div><div>Is it enough?<br></div><di=
v>&gt;<br></div><div>&gt; Drew<br></div><div><br></div><div>&gt;&nbsp;<b=
r></div><div>&gt; On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov =
wrote:<br></div><div>&gt; &gt; On Mon, Mar 18, 2024 at 07:26:10AM -0500,=
 Mike Karels wrote:<br></div><div>&gt; &gt; &gt; On 18 Mar 2024, at 7:04=
,&nbsp;<a href=3D"mailto:tuexen@freebsd.org">tuexen@freebsd.org</a> wrot=
e:<br></div><div>&gt; &gt; &gt;&nbsp;<br></div><div>&gt; &gt; &gt; &gt;&=
gt; On 18. Mar 2024, at 12:42, Nuno Teixeira &lt;<a href=3D"mailto:eduar=
do@freebsd.org">eduardo@freebsd.org</a>&gt; wrote:<br></div><div>&gt; &g=
t; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; Hello all!<br></d=
iv><div>&gt; &gt; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; It=
 works just fine!<br></div><div>&gt; &gt; &gt; &gt;&gt; System performan=
ce is OK.<br></div><div>&gt; &gt; &gt; &gt;&gt; Using patch on main-n268=
841-b0aaf8beb126(-dirty).<br></div><div>&gt; &gt; &gt; &gt;&gt;<br></div=
><div>&gt; &gt; &gt; &gt;&gt; ---<br></div><div>&gt; &gt; &gt; &gt;&gt; =
net.inet.tcp.functions_available:<br></div><div>&gt; &gt; &gt; &gt;&gt; =
Stack&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp; D Alias&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PCB count<br></div><div>&gt;=
 &gt; &gt; &gt;&gt; freebsd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; freebsd&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br></div><div>&g=
t; &gt; &gt; &gt;&gt; rack&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * rack&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; 38<br></div><div>&gt; &gt; &gt; &gt;&gt; ---<br></div><div>&gt; &gt;=
 &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; It would be so nice=
 that we can have a sysctl tunnable for this patch<br></div><div>&gt; &g=
t; &gt; &gt;&gt; so we could do more tests without recompiling kernel.<b=
r></div><div>&gt; &gt; &gt; &gt; Thanks for testing!<br></div><div>&gt; =
&gt; &gt; &gt;<br></div><div>&gt; &gt; &gt; &gt; @gallatin: can you come=
 up with a patch that is acceptable for Netflix<br></div><div>&gt; &gt; =
&gt; &gt; and allows to mitigate the performance regression.<br></div><d=
iv>&gt; &gt; &gt;&nbsp;<br></div><div>&gt; &gt; &gt; Ideally, tcphpts co=
uld enable this automatically when it starts to be<br></div><div>&gt; &g=
t; &gt; used (enough?), but a sysctl could select auto/on/off.<br></div>=
<div>&gt; &gt; There is already a well-known mechanism to request execut=
ion of the<br></div><div>&gt; &gt; specific function on return to usersp=
ace, namely AST.&nbsp; The difference<br></div><div>&gt; &gt; with the c=
urrent hack is that the execution is requested for one callback<br></div=
><div>&gt; &gt; in the context of the specific thread.<br></div><div>&gt=
; &gt;&nbsp;<br></div><div>&gt; &gt; Still, it might be worth a try to u=
se it; what is the reason to hit a thread<br></div><div>&gt; &gt; that d=
oes not do networking, with TCP processing?<br></div><div>&gt; &gt;&nbsp=
;<br></div><div>&gt; &gt; &gt;&nbsp;<br></div><div>&gt; &gt; &gt; Mike<b=
r></div><div>&gt; &gt; &gt;&nbsp;<br></div><div>&gt; &gt; &gt; &gt; Best=
 regards<br></div><div>&gt; &gt; &gt; &gt; Michael<br></div><div>&gt; &g=
t; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; Thanks all!<br></=
div><div>&gt; &gt; &gt; &gt;&gt; Really happy here :)<br></div><div>&gt;=
 &gt; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; Cheers,<br></d=
iv><div>&gt; &gt; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt; Nu=
no Teixeira &lt;<a href=3D"mailto:eduardo@freebsd.org">eduardo@freebsd.o=
rg</a>&gt; escreveu (domingo, 17/03/2024 =C3=A0(s) 20:26):<br></div><div=
>&gt; &gt; &gt; &gt;&gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt; H=
ello,<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt;<br></div><div>&gt; &gt; =
&gt; &gt;&gt;&gt;&gt; I don't have the full context, but it seems like t=
he complaint is a performance regression in bonnie++ and perhaps other t=
hings when tcp_hpts is loaded, even when it is not used.&nbsp; Is that c=
orrect?<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt;&gt;<br></div><div>&gt;=
 &gt; &gt; &gt;&gt;&gt;&gt; If so, I suspect its because we drive the tc=
p_hpts_softclock() routine from userret(), in order to avoid tons of tim=
er interrupts and context switches.&nbsp; To test this theory,&nbsp; you=
 could apply a patch like:<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt;<br>=
</div><div>&gt; &gt; &gt; &gt;&gt;&gt; It's affecting overall system per=
formance, bonnie was just a way to<br></div><div>&gt; &gt; &gt; &gt;&gt;=
&gt; get some numbers to compare.<br></div><div>&gt; &gt; &gt; &gt;&gt;&=
gt;<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt; Tomorrow I will test patch=
.<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt;<br></div><div>&gt; &gt; &gt;=
 &gt;&gt;&gt; Thanks!<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt;<br></div=
><div>&gt; &gt; &gt; &gt;&gt;&gt; --<br></div><div>&gt; &gt; &gt; &gt;&g=
t;&gt; Nuno Teixeira<br></div><div>&gt; &gt; &gt; &gt;&gt;&gt; FreeBSD C=
ommitter (ports)<br></div><div>&gt; &gt; &gt; &gt;&gt;<br></div><div>&gt=
; &gt; &gt; &gt;&gt;<br></div><div>&gt; &gt; &gt; &gt;&gt;<br></div><div=
>&gt; &gt; &gt; &gt;&gt; --&nbsp;<br></div><div>&gt; &gt; &gt; &gt;&gt; =
Nuno Teixeira<br></div><div>&gt; &gt; &gt; &gt;&gt; FreeBSD Committer (p=
orts)<br></div><div>&gt; &gt; &gt;&nbsp;<br></div><div>&gt; &gt;&nbsp;<b=
r></div><div><br></div></blockquote><div><br></div></body></html>
--49341a7599d5444a8bbcc6b7abbe0677--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38c54399-6c96-44d8-a3a2-3cc1bfbe50c2>