FreeBSD Mail Archives

Date:      Wed, 10 Apr 2024 12:40:16 +0100
From:      Nuno Teixeira <eduardo@freebsd.org>
To:        tuexen@freebsd.org
Cc:        Drew Gallatin <gallatin@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, rrs <rrs@lakerest.net>,  Mike Karels <mike@karels.net>, garyj@gmx.de, current@freebsd.org, net@freebsd.org,  Randall Stewart <rrs@freebsd.org>
Subject:   Re: Request for Testing: TCP RACK
Message-ID:  <CAFDf7U%2BPtNhVsbR=pMhGVnxW5thWmzYBq9%2Bd4rSbZD0zRnJjkg@mail.gmail.com>
In-Reply-To: <5C9863F7-0F1C-4D02-9F6D-9DDC5FBEB368@freebsd.org>
References:  <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <CAFDf7UKDWSnhm%2BTwP=ZZ9dkk0jmAgjGKPLpkX-CKuw3yH233gQ@mail.gmail.com> <CAFDf7UJq9SCnU-QYmS3t6EknP369w2LR0dNkQAc-NaRLvwVfoQ@mail.gmail.com> <A3F1FC0C-C199-4565-8E07-B233ED9E7B2E@freebsd.org> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <ZfiI7GcbTwSG8kkO@kib.kiev.ua> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <ZfiY-xUUM3wrBEz_@kib.kiev.ua> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> <Zft8odA0s49eLhvk@kib.kiev.ua> <fc160c79-1884-4f68-8310-35e7ac0b9dd6@app.fastmail.com> <CAFDf7U%2Bsd0Hx6scDOvMuxOQaqVb8wi1ODrwT=uKDCqMn_81QEw@mail.gmail.com> <5C9863F7-0F1C-4D02-9F6D-9DDC5FBEB368@freebsd.org>

--0000000000004c6d020615bc81cf
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hello all,

@ current 1500018 and fetching torrents with net-p2p/qbittorrent finished
~2GB download and connection UP until the end:

---
Apr 10 11:26:46 leg kernel: re0: watchdog timeout
Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0
Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
192.168.1.255
Apr 10 11:26:49 leg kernel: re0: link state changed to UP
Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
---

In the past tests, I've got more watchdog timeouts, connection goes down
and a reboot needed to put it back (`service netif restart` didn't work).

Other way to reproduce this is using sysutils/restic (backup program) to
read/check all files from a remote server via sftp:

`restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
compressed backup.

---
watchdog timeout x3 as above
---

restic check fail log @ 15% progress:
---
<snip>
Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
1.7670599s: connection lost
Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after
4.619104908s: connection lost
Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
5.477648517s: connection lost
List(lock) returned error, retrying after 293.057766ms: connection lost
List(lock) returned error, retrying after 385.206693ms: connection lost
List(lock) returned error, retrying after 1.577594281s: connection lost
<snip>

Connection continues UP.

Cheers,

<tuexen@freebsd.org> escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53):

> > On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote:
> >
> > Hello all!
> >
> > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
> (amd64)!
> >
> > Thanks all!
> Thanks for the feedback!
>
> Best regards
> Michael
> >
> > Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 =C3=
=A0(s)
> 12:58):
> > The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
> >
> > In the common case of a system which has less than the threshold  numbe=
r
> of connections , we access the tcp_hpts_softclock function pointer, make
> one function call, and access hpts_that_need_softclock, and then return.
> So that's 2 variables and a function call.
> >
> > I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that th=
ey
> are in the same cacheline.  Then we'd be hitting just a single line in th=
e
> common case.  (I've made comments on the review to that effect).
> >
> > Also, I wonder if the threshold could get higher by default, so that
> hpts is never called in this context unless we're to the point where we'r=
e
> scheduling thousands of runs of the hpts thread (and taking all those clo=
ck
> interrupts).
> >
> > Drew
> >
> > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> >>> Ok I have created
> >>>
> >>> https://reviews.freebsd.org/D44420
> >>>
> >>>
> >>> To address the issue. I also attach a short version of the patch that
> Nuno
> >>> can try and validate
> >>>
> >>> it works. Drew you may want to try this and validate the optimization
> does
> >>> kick in since I can
> >>>
> >>> only now test that it does not on my local box :)
> >> The patch still causes access to all cpu's cachelines on each userret.
> >> It would be much better to inc/check the threshold and only schedule t=
he
> >> call when exceeded.  Then the call can occur in some dedicated context=
,
> >> like per-CPU thread, instead of userret.
> >>
> >>>
> >>>
> >>> R
> >>>
> >>>
> >>>
> >>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
> >>>> No.  The goal is to run on every return to userspace for every threa=
d.
> >>>>
> >>>> Drew
> >>>>
> >>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> >>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> >>>>>> I got the idea from
> >>>>>>
> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> >>>>>> The gist is that the TCP pacing stuff needs to run frequently, and
> >>>>>> rather than run it out of a clock interrupt, its more efficient to
> run
> >>>>>> it out of a system call context at just the point where we return =
to
> >>>>>> userspace and the cache is trashed anyway. The current
> implementation
> >>>>>> is fine for our workload, but probably not idea for a generic
> system.
> >>>>>> Especially one where something is banging on system calls.
> >>>>>>
> >>>>>> Ast's could be the right tool for this, but I'm super unfamiliar
> with
> >>>>>> them, and I can't find any docs on them.
> >>>>>>
> >>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent =
to
> >>>>>> what's happening here?
> >>>>> This call would need some AST number added, and then it registers t=
he
> >>>>> ast to run on next return to userspace, for the current thread.
> >>>>>
> >>>>> Is it enough?
> >>>>>>
> >>>>>> Drew
> >>>>>
> >>>>>>
> >>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> >>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> >>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote:
> >>>>>>>>
> >>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira
> >>>>> <eduardo@freebsd.org> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello all!
> >>>>>>>>>>
> >>>>>>>>>> It works just fine!
> >>>>>>>>>> System performance is OK.
> >>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty).
> >>>>>>>>>>
> >>>>>>>>>> ---
> >>>>>>>>>> net.inet.tcp.functions_available:
> >>>>>>>>>> Stack                           D
> >>>>> Alias                            PCB count
> >>>>>>>>>> freebsd freebsd                          0
> >>>>>>>>>> rack                            *
> >>>>> rack                             38
> >>>>>>>>>> ---
> >>>>>>>>>>
> >>>>>>>>>> It would be so nice that we can have a sysctl tunnable for
> >>>>> this patch
> >>>>>>>>>> so we could do more tests without recompiling kernel.
> >>>>>>>>> Thanks for testing!
> >>>>>>>>>
> >>>>>>>>> @gallatin: can you come up with a patch that is acceptable
> >>>>> for Netflix
> >>>>>>>>> and allows to mitigate the performance regression.
> >>>>>>>>
> >>>>>>>> Ideally, tcphpts could enable this automatically when it
> >>>>> starts to be
> >>>>>>>> used (enough?), but a sysctl could select auto/on/off.
> >>>>>>> There is already a well-known mechanism to request execution of t=
he
> >>>>>>> specific function on return to userspace, namely AST.  The
> difference
> >>>>>>> with the current hack is that the execution is requested for one
> >>>>> callback
> >>>>>>> in the context of the specific thread.
> >>>>>>>
> >>>>>>> Still, it might be worth a try to use it; what is the reason to
> >>>>> hit a thread
> >>>>>>> that does not do networking, with TCP processing?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Mike
> >>>>>>>>
> >>>>>>>>> Best regards
> >>>>>>>>> Michael
> >>>>>>>>>>
> >>>>>>>>>> Thanks all!
> >>>>>>>>>> Really happy here :)
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>>
> >>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo,
> >>>>> 17/03/2024 =C3=A0(s) 20:26):
> >>>>>>>>>>>
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>>> I don't have the full context, but it seems like the
> >>>>> complaint is a performance regression in bonnie++ and perhaps other
> >>>>> things when tcp_hpts is loaded, even when it is not used.  Is that
> >>>>> correct?
> >>>>>>>>>>>>
> >>>>>>>>>>>> If so, I suspect its because we drive the
> >>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons
> >>>>> of timer interrupts and context switches.  To test this theory,  yo=
u
> >>>>> could apply a patch like:
> >>>>>>>>>>>
> >>>>>>>>>>> It's affecting overall system performance, bonnie was just
> >>>>> a way to
> >>>>>>>>>>> get some numbers to compare.
> >>>>>>>>>>>
> >>>>>>>>>>> Tomorrow I will test patch.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Nuno Teixeira
> >>>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Nuno Teixeira
> >>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> >>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c
> >>> index 8c4d2d41a3eb..eadbee19f69c 100644
> >>> --- a/sys/netinet/tcp_hpts.c
> >>> +++ b/sys/netinet/tcp_hpts.c
> >>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {
> >>> void *ie_cookie;
> >>> uint16_t p_num; /* The hpts number one per cpu */
> >>> uint16_t p_cpu; /* The hpts CPU */
> >>> + uint8_t hit_callout_thresh;
> >>> /* There is extra space in here */
> >>> /* Cache line 0x100 */
> >>> struct callout co __aligned(CACHE_LINE_SIZE);
> >>> @@ -269,6 +270,11 @@ static struct hpts_domain_info {
> >>> int cpu[MAXCPU];
> >>> } hpts_domains[MAXMEMDOM];
> >>>
> >>> +counter_u64_t hpts_that_need_softclock;
> >>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock=
,
> CTLFLAG_RD,
> >>> +    &hpts_that_need_softclock,
> >>> +    "Number of hpts threads that need softclock");
> >>> +
> >>> counter_u64_t hpts_hopelessly_behind;
> >>>
> >>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless,
> CTLFLAG_RD,
> >>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO,
> precision, CTLFLAG_RW,
> >>>     &tcp_hpts_precision, 120,
> >>>     "Value for PRE() precision of callout");
> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW,
> >>> -    &conn_cnt_thresh, 0,
> >>> +    &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD,
> >>>     "How many connections (below) make us use the callout based
> mechanism");
> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW,
> >>>     &hpts_does_tp_logging, 0,
> >>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)
> >>> struct tcp_hpts_entry *hpts;
> >>> int ticks_ran;
> >>>
> >>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0)
> >>> + return;
> >>> +
> >>> hpts =3D tcp_choose_hpts_to_run();
> >>>
> >>> if (hpts->p_hpts_active) {
> >>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
> >>> ticks_ran =3D tcp_hptsi(hpts, 1);
> >>> tv.tv_sec =3D 0;
> >>> tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT;
> >>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) &&
> (hpts->hit_callout_thresh =3D=3D 0)) {
> >>> + hpts->hit_callout_thresh =3D 1;
> >>> + counter_u64_add(hpts_that_need_softclock, 1);
> >>> + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) &&
> (hpts->hit_callout_thresh =3D=3D 1)) {
> >>> + hpts->hit_callout_thresh =3D 0;
> >>> + counter_u64_add(hpts_that_need_softclock, -1);
> >>> + }
> >>> if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) {
> >>> if(hpts->p_direct_wake =3D=3D 0) {
> >>> /*
> >>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)
> >>> cpu_top =3D NULL;
> >>> #endif
> >>> tcp_pace.rp_num_hptss =3D ncpus;
> >>> + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK);
> >>> hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK);
> >>> hpts_loops =3D counter_u64_alloc(M_WAITOK);
> >>> back_tosleep =3D counter_u64_alloc(M_WAITOK);
> >>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)
> >>> free(tcp_pace.grps, M_TCPHPTS);
> >>> #endif
> >>>
> >>> + counter_u64_free(hpts_that_need_softclock);
> >>> counter_u64_free(hpts_hopelessly_behind);
> >>> counter_u64_free(hpts_loops);
> >>> counter_u64_free(back_tosleep);
> >>
> >>
> >
> >
> >
> > --
> > Nuno Teixeira
> > FreeBSD Committer (ports)
>
>

--=20
Nuno Teixeira
FreeBSD Committer (ports)

--0000000000004c6d020615bc81cf
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello all,</div><div><br></div><div>@ current 1500018=
 and fetching torrents with net-p2p/qbittorrent finished ~2GB download and =
connection UP until the end: <br></div><div><br></div><div>---<br></div><di=
v>Apr 10 11:26:46 leg kernel: re0: watchdog timeout<br>Apr 10 11:26:46 leg =
kernel: re0: link state changed to DOWN<br>Apr 10 11:26:49 leg dhclient[588=
10]: New IP Address (re0): 192.168.1.67<br>Apr 10 11:26:49 leg dhclient[588=
14]: New Subnet Mask (re0): 255.255.255.0<br>Apr 10 11:26:49 leg dhclient[5=
8818]: New Broadcast Address (re0): 192.168.1.255<br>Apr 10 11:26:49 leg ke=
rnel: re0: link state changed to UP<br>Apr 10 11:26:49 leg dhclient[58822]:=
 New Routers (re0): 192.168.1.1</div><div>---</div><div><br></div><div>In t=
he past tests, I&#39;ve got more watchdog timeouts, connection goes down an=
d a reboot needed to put it back (`service netif restart` didn&#39;t work).=
</div><div><br></div><div>Other way to reproduce this is using sysutils/res=
tic (backup program) to read/check all files from a remote server via sftp:=
</div><div><br></div><div>`restic -r sftp:user@remote:restic-repo check --r=
ead-data` from a 60GB compressed backup.</div><div><br></div><div>---</div>=
<div>watchdog timeout x3 as above</div><div>---</div><div><br></div><div>re=
stic check fail log @ 15% progress:<br></div><div>---</div><div>&lt;snip&gt=
;<br></div><div>Load(&lt;data/52e2923dd6&gt;, 17310001, 0) returned error, =
retrying after 1.7670599s: connection lost<br>Load(&lt;data/d27a0abe0f&gt;,=
 17456892, 0) returned error, retrying after 4.619104908s: connection lost<=
br>Load(&lt;data/52e2923dd6&gt;, 17310001, 0) returned error, retrying afte=
r 5.477648517s: connection lost<br>List(lock) returned error, retrying afte=
r 293.057766ms: connection lost<br>List(lock) returned error, retrying afte=
r 385.206693ms: connection lost<br>List(lock) returned error, retrying afte=
r 1.577594281s: connection lost</div><div>&lt;snip&gt;</div><div><br></div>=
<div>Connection continues UP.</div><div><br></div><div>Cheers,<br></div></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">&lt=
;<a href=3D"mailto:tuexen@freebsd.org" target=3D"_blank">tuexen@freebsd.org=
</a>&gt; escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53):<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex">&gt; On 28. Mar 2024, at 15:00, Nun=
o Teixeira &lt;<a href=3D"mailto:eduardo@freebsd.org" target=3D"_blank">edu=
ardo@freebsd.org</a>&gt; wrote:<br>
&gt; <br>
&gt; Hello all!<br>
&gt; <br>
&gt; Running rack @b7b78c1c169 &quot;Optimize HPTS...&quot; very happy on m=
y laptop (amd64)!<br>
&gt; <br>
&gt; Thanks all!<br>
Thanks for the feedback!<br>
<br>
Best regards<br>
Michael<br>
&gt; <br>
&gt; Drew Gallatin &lt;<a href=3D"mailto:gallatin@freebsd.org" target=3D"_b=
lank">gallatin@freebsd.org</a>&gt; escreveu (quinta, 21/03/2024 =C3=A0(s) 1=
2:58):<br>
&gt; The entire point is to *NOT* go through the overhead of scheduling som=
ething asynchronously, but to take advantage of the fact that a user/kernel=
 transition is going to trash the cache anyway.<br>
&gt; <br>
&gt; In the common case of a system which has less than the threshold=C2=A0=
 number of connections , we access the tcp_hpts_softclock function pointer,=
 make one function call, and access hpts_that_need_softclock, and then retu=
rn.=C2=A0 So that&#39;s 2 variables and a function call.<br>
&gt; <br>
&gt; I think it would be preferable to avoid that call, and to move the dec=
laration of tcp_hpts_softclock and hpts_that_need_softclock so that they ar=
e in the same cacheline.=C2=A0 Then we&#39;d be hitting just a single line =
in the common case.=C2=A0 (I&#39;ve made comments on the review to that eff=
ect).<br>
&gt; <br>
&gt; Also, I wonder if the threshold could get higher by default, so that h=
pts is never called in this context unless we&#39;re to the point where we&=
#39;re scheduling thousands of runs of the hpts thread (and taking all thos=
e clock interrupts).<br>
&gt; <br>
&gt; Drew<br>
&gt; <br>
&gt; On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:<br>
&gt;&gt; On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:<br>
&gt;&gt;&gt; Ok I have created<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; <a href=3D"https://reviews.freebsd.org/D44420" rel=3D"noreferr=
er" target=3D"_blank">https://reviews.freebsd.org/D44420</a><br>;
&gt;&gt;&gt; <br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; To address the issue. I also attach a short version of the pat=
ch that Nuno<br>
&gt;&gt;&gt; can try and validate<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; it works. Drew you may want to try this and validate the optim=
ization does<br>
&gt;&gt;&gt; kick in since I can<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; only now test that it does not on my local box :)<br>
&gt;&gt; The patch still causes access to all cpu&#39;s cachelines on each =
userret.<br>
&gt;&gt; It would be much better to inc/check the threshold and only schedu=
le the<br>
&gt;&gt; call when exceeded.=C2=A0 Then the call can occur in some dedicate=
d context,<br>
&gt;&gt; like per-CPU thread, instead of userret.<br>
&gt;&gt; <br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; R<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; On 3/18/24 3:42 PM, Drew Gallatin wrote:<br>
&gt;&gt;&gt;&gt; No.=C2=A0 The goal is to run on every return to userspace =
for every thread.<br>
&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt; Drew<br>
&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt; On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrot=
e:<br>
&gt;&gt;&gt;&gt;&gt; On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallati=
n wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt; I got the idea from<br>
&gt;&gt;&gt;&gt;&gt;&gt; <a href=3D"https://people.mpi-sws.org/~druschel/pu=
blications/soft-timers-tocs.pdf" rel=3D"noreferrer" target=3D"_blank">https=
://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf</a><br>
&gt;&gt;&gt;&gt;&gt;&gt; The gist is that the TCP pacing stuff needs to run=
 frequently, and<br>
&gt;&gt;&gt;&gt;&gt;&gt; rather than run it out of a clock interrupt, its m=
ore efficient to run<br>
&gt;&gt;&gt;&gt;&gt;&gt; it out of a system call context at just the point =
where we return to<br>
&gt;&gt;&gt;&gt;&gt;&gt; userspace and the cache is trashed anyway. The cur=
rent implementation<br>
&gt;&gt;&gt;&gt;&gt;&gt; is fine for our workload, but probably not idea fo=
r a generic system.<br>
&gt;&gt;&gt;&gt;&gt;&gt; Especially one where something is banging on syste=
m calls.<br>
&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt; Ast&#39;s could be the right tool for this, but I&=
#39;m super unfamiliar with<br>
&gt;&gt;&gt;&gt;&gt;&gt; them, and I can&#39;t find any docs on them.<br>
&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt; Would ast_register(0, ASTR_UNCOND, 0, func) be rou=
ghly equivalent to<br>
&gt;&gt;&gt;&gt;&gt;&gt; what&#39;s happening here?<br>
&gt;&gt;&gt;&gt;&gt; This call would need some AST number added, and then i=
t registers the<br>
&gt;&gt;&gt;&gt;&gt; ast to run on next return to userspace, for the curren=
t thread.<br>
&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt; Is it enough?<br>
&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt; Drew<br>
&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt; On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belou=
sov wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike=
 Karels wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 18 Mar 2024, at 7:04, <a href=3D"mailto=
:tuexen@freebsd.org" target=3D"_blank">tuexen@freebsd.org</a> wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 18. Mar 2024, at 12:42, Nuno Te=
ixeira<br>
&gt;&gt;&gt;&gt;&gt; &lt;<a href=3D"mailto:eduardo@freebsd.org" target=3D"_=
blank">eduardo@freebsd.org</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hello all!<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; It works just fine!<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; System performance is OK.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Using patch on main-n268841-b0aaf8=
beb126(-dirty).<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ---<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; net.inet.tcp.functions_available:<=
br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Stack=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0D<br>
&gt;&gt;&gt;&gt;&gt; Alias=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PCB count<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; freebsd freebsd=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=
<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; rack=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *<br>
&gt;&gt;&gt;&gt;&gt; rack=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A038<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; ---<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; It would be so nice that we can ha=
ve a sysctl tunnable for<br>
&gt;&gt;&gt;&gt;&gt; this patch<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; so we could do more tests without =
recompiling kernel.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks for testing!<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; @gallatin: can you come up with a patc=
h that is acceptable<br>
&gt;&gt;&gt;&gt;&gt; for Netflix<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; and allows to mitigate the performance=
 regression.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Ideally, tcphpts could enable this automat=
ically when it<br>
&gt;&gt;&gt;&gt;&gt; starts to be<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; used (enough?), but a sysctl could select =
auto/on/off.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; There is already a well-known mechanism to req=
uest execution of the<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; specific function on return to userspace, name=
ly AST.=C2=A0 The difference<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; with the current hack is that the execution is=
 requested for one<br>
&gt;&gt;&gt;&gt;&gt; callback<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; in the context of the specific thread.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Still, it might be worth a try to use it; what=
 is the reason to<br>
&gt;&gt;&gt;&gt;&gt; hit a thread<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; that does not do networking, with TCP processi=
ng?<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Mike<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Best regards<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Michael<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks all!<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Really happy here :)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Cheers,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Nuno Teixeira &lt;<a href=3D"mailt=
o:eduardo@freebsd.org" target=3D"_blank">eduardo@freebsd.org</a>&gt; escrev=
eu (domingo,<br>
&gt;&gt;&gt;&gt;&gt; 17/03/2024 =C3=A0(s) 20:26):<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hello,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I don&#39;t have the full =
context, but it seems like the<br>
&gt;&gt;&gt;&gt;&gt; complaint is a performance regression in bonnie++ and =
perhaps other<br>
&gt;&gt;&gt;&gt;&gt; things when tcp_hpts is loaded, even when it is not us=
ed.=C2=A0 Is that<br>
&gt;&gt;&gt;&gt;&gt; correct?<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If so, I suspect its becau=
se we drive the<br>
&gt;&gt;&gt;&gt;&gt; tcp_hpts_softclock() routine from userret(), in order =
to avoid tons<br>
&gt;&gt;&gt;&gt;&gt; of timer interrupts and context switches.=C2=A0 To tes=
t this theory,=C2=A0 you<br>
&gt;&gt;&gt;&gt;&gt; could apply a patch like:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; It&#39;s affecting overall sys=
tem performance, bonnie was just<br>
&gt;&gt;&gt;&gt;&gt; a way to<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; get some numbers to compare.<b=
r>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Tomorrow I will test patch.<br=
>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks!<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Nuno Teixeira<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; FreeBSD Committer (ports)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Nuno Teixeira<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; FreeBSD Committer (ports)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt;&gt; <br>
&gt;&gt;&gt;&gt; <br>
&gt;&gt; <br>
&gt;&gt;&gt; diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c<b=
r>
&gt;&gt;&gt; index 8c4d2d41a3eb..eadbee19f69c 100644<br>
&gt;&gt;&gt; --- a/sys/netinet/tcp_hpts.c<br>
&gt;&gt;&gt; +++ b/sys/netinet/tcp_hpts.c<br>
&gt;&gt;&gt; @@ -216,6 +216,7 @@ struct tcp_hpts_entry {<br>
&gt;&gt;&gt; void *ie_cookie;<br>
&gt;&gt;&gt; uint16_t p_num; /* The hpts number one per cpu */<br>
&gt;&gt;&gt; uint16_t p_cpu; /* The hpts CPU */<br>
&gt;&gt;&gt; + uint8_t hit_callout_thresh;<br>
&gt;&gt;&gt; /* There is extra space in here */<br>
&gt;&gt;&gt; /* Cache line 0x100 */<br>
&gt;&gt;&gt; struct callout co __aligned(CACHE_LINE_SIZE);<br>
&gt;&gt;&gt; @@ -269,6 +270,11 @@ static struct hpts_domain_info {<br>
&gt;&gt;&gt; int cpu[MAXCPU];<br>
&gt;&gt;&gt; } hpts_domains[MAXMEMDOM];<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; +counter_u64_t hpts_that_need_softclock;<br>
&gt;&gt;&gt; +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needso=
ftclock, CTLFLAG_RD,<br>
&gt;&gt;&gt; +=C2=A0 =C2=A0 &amp;hpts_that_need_softclock,<br>
&gt;&gt;&gt; +=C2=A0 =C2=A0 &quot;Number of hpts threads that need softcloc=
k&quot;);<br>
&gt;&gt;&gt; +<br>
&gt;&gt;&gt; counter_u64_t hpts_hopelessly_behind;<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeles=
s, CTLFLAG_RD,<br>
&gt;&gt;&gt; @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, p=
recision, CTLFLAG_RW,<br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0&amp;tcp_hpts_precision, 120,<br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0&quot;Value for PRE() precision of callout&=
quot;);<br>
&gt;&gt;&gt; SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_R=
W,<br>
&gt;&gt;&gt; -=C2=A0 =C2=A0 &amp;conn_cnt_thresh, 0,<br>
&gt;&gt;&gt; +=C2=A0 =C2=A0 &amp;conn_cnt_thresh, DEFAULT_CONNECTION_THESHO=
LD,<br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0&quot;How many connections (below) make us =
use the callout based mechanism&quot;);<br>
&gt;&gt;&gt; SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW,<=
br>
&gt;&gt;&gt;=C2=A0 =C2=A0 =C2=A0&amp;hpts_does_tp_logging, 0,<br>
&gt;&gt;&gt; @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)<br>
&gt;&gt;&gt; struct tcp_hpts_entry *hpts;<br>
&gt;&gt;&gt; int ticks_ran;<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0)<br=
>
&gt;&gt;&gt; + return;<br>
&gt;&gt;&gt; +<br>
&gt;&gt;&gt; hpts =3D tcp_choose_hpts_to_run();<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; if (hpts-&gt;p_hpts_active) {<br>
&gt;&gt;&gt; @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)<br>
&gt;&gt;&gt; ticks_ran =3D tcp_hptsi(hpts, 1);<br>
&gt;&gt;&gt; tv.tv_sec =3D 0;<br>
&gt;&gt;&gt; tv.tv_usec =3D hpts-&gt;p_hpts_sleep_time * HPTS_TICKS_PER_SLO=
T;<br>
&gt;&gt;&gt; + if ((hpts-&gt;p_on_queue_cnt &gt; conn_cnt_thresh) &amp;&amp=
; (hpts-&gt;hit_callout_thresh =3D=3D 0)) {<br>
&gt;&gt;&gt; + hpts-&gt;hit_callout_thresh =3D 1;<br>
&gt;&gt;&gt; + counter_u64_add(hpts_that_need_softclock, 1);<br>
&gt;&gt;&gt; + } else if ((hpts-&gt;p_on_queue_cnt &lt;=3D conn_cnt_thresh)=
 &amp;&amp; (hpts-&gt;hit_callout_thresh =3D=3D 1)) {<br>
&gt;&gt;&gt; + hpts-&gt;hit_callout_thresh =3D 0;<br>
&gt;&gt;&gt; + counter_u64_add(hpts_that_need_softclock, -1);<br>
&gt;&gt;&gt; + }<br>
&gt;&gt;&gt; if (hpts-&gt;p_on_queue_cnt &gt;=3D conn_cnt_thresh) {<br>
&gt;&gt;&gt; if(hpts-&gt;p_direct_wake =3D=3D 0) {<br>
&gt;&gt;&gt; /*<br>
&gt;&gt;&gt; @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)<br>
&gt;&gt;&gt; cpu_top =3D NULL;<br>
&gt;&gt;&gt; #endif<br>
&gt;&gt;&gt; tcp_pace.rp_num_hptss =3D ncpus;<br>
&gt;&gt;&gt; + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK);<br=
>
&gt;&gt;&gt; hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK);<br>
&gt;&gt;&gt; hpts_loops =3D counter_u64_alloc(M_WAITOK);<br>
&gt;&gt;&gt; back_tosleep =3D counter_u64_alloc(M_WAITOK);<br>
&gt;&gt;&gt; @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)<br>
&gt;&gt;&gt; free(tcp_pace.grps, M_TCPHPTS);<br>
&gt;&gt;&gt; #endif<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; + counter_u64_free(hpts_that_need_softclock);<br>
&gt;&gt;&gt; counter_u64_free(hpts_hopelessly_behind);<br>
&gt;&gt;&gt; counter_u64_free(hpts_loops);<br>
&gt;&gt;&gt; counter_u64_free(back_tosleep);<br>
&gt;&gt; <br>
&gt;&gt; <br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; -- <br>
&gt; Nuno Teixeira<br>
&gt; FreeBSD Committer (ports)<br>
<br>
</blockquote></div><br clear=3D"all"><br><span class=3D"gmail_signature_pre=
fix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"l=
tr"><span style=3D"color:rgb(102,102,102)">Nuno Teixeira<br>FreeBSD Committ=
er (ports)</span></div></div>

--0000000000004c6d020615bc81cf--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFDf7U%2BPtNhVsbR=pMhGVnxW5thWmzYBq9%2Bd4rSbZD0zRnJjkg>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation