From nobody Thu Mar 28 14:00:58 2024 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4V54vW725Cz5FwtG; Thu, 28 Mar 2024 14:01:11 +0000 (UTC) (envelope-from eduardo@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4V54vW5H7Wz4dv2; Thu, 28 Mar 2024 14:01:11 +0000 (UTC) (envelope-from eduardo@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1711634471; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8Fgz2To4xjvfyXPzn6/M3cC9GrXPIqXvsovM+n+AHs=; b=pH0KDm0QO8lB/W1lC2ZtXOfKYBwPQVITO4j3OHJW4PFBSrW26FtCqTzWlIVJpb2W1Ic3ld mgADovlLu7jpBagEmZLunSAzc5kv+QV2upTdeeKENAhYbaCNcKQoH+X3qrFAEOXDwa10Ge IYIUyXg6hKT4ximBpr2sKDAPSzrDOXmNWciy1uVr4QjYI4SMdQrGN71rYsiCjZ4JzZbXz7 iK1kKuWyZvjjH/PjcG26+gNiDYQjN/X71iRcAlCQH5XaozV83JpdsrISvQPWnA+Ek5KG+Q eoHaKOUW4VwYacSVfNzpZA+fBfGaNprlJoj+/GYTCQNcrhOiE2W92Xl0Np05UA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1711634471; a=rsa-sha256; cv=none; b=Fzoy6IwH2uNNoDPDi4559ug/HDv7YlnLpmf1ae/q0P7jTTH8znz7Qv2KAQ3Sz8+21GH55n xxcGAyMa8Joy1USeb4OX8MkYNl1hIcCLKXe7EWBq8c4vZN7x+vhYpIPcNE4z3jEo9wvJzU QkTC54nHzch/YhT98rOjSNqfnR/Dr+a9uz96Xgvkbqr5XSU8o7e0rE3FC8fwFZh3/vuadD hyNcwSSOhMgRPXCcZHI5HH58fR+DpoNFh1bWmlMcCoJTICLVMmLWoUlXmxpsPE7ZTt/mJL J6R14GZ/9gU7Ny5/wtIaj6k3T4WS5gpGnyzGJURktXnEVXQrXxarLGKhiatEFA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1711634471; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8Fgz2To4xjvfyXPzn6/M3cC9GrXPIqXvsovM+n+AHs=; b=F/8sL2RZPzzryWjH32+KXUGSqBhkhOuK71JLB+2rXooiZXm+pXuoZiXIxvUe2wvThAQrE1 stv0+4z4FRuHhMtvk15BWN/yF+iLP/BtDDEnFwnLylMPt73SM+69vQDI6Qfc1bwLdnl6ne EAkL+movyj/3G8dWuyUl7MYlvx8MVHWmco3lQ91/LaAdZHf36NaGf2RT6XxLyL3eBUykEu FLxXScq1nfgRYAwyf9kVOkDmslaO5Q4OHb1Dw+Y06aP/XgjIhtJAtktDKMw8/xYFUv2d1l 5ZyAYg9XQef87KMu9VgABwZtev9wFdrvUUqLEldRu8DcixG2muEl8sf4QGYzjw== Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) (Authenticated sender: eduardo) by smtp.freebsd.org (Postfix) with ESMTPSA id 4V54vW4nBZz1CVc; Thu, 28 Mar 2024 14:01:11 +0000 (UTC) (envelope-from eduardo@freebsd.org) Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-3c3ca3c3bbaso603313b6e.0; Thu, 28 Mar 2024 07:01:11 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVctPojx5j+jhNkb84XL3D0mGIAxAJ6MKg422EAxO0+KbfjKC6c8cL8+azwfkuZqOMldxbsf1qeMjuRRQk36+gmFoqrIZ1AmuNWHhj2jvnXefaR6gYKhqUO2XvdV76YcPlHHJH9o0np5d87rdaTSKAATvzL7laCthQV X-Gm-Message-State: AOJu0Yw+yS57fhvhKaDj0mh7qRVcjtPVpoBbXmK7qSMiftnHdoMVSF+R 3Wh93FWjv3NVMT2wmsewUoxqz/Pj9H+vFC9gynVZAxZcaZU9IL6crWM3d9AOFa8zQkaHJE6Ar6d N+th0BpZ5J8DfXUHB+lftLjmKorI= X-Google-Smtp-Source: AGHT+IEt4r3OmgtY+ivehZZtzjIaAoJwVNOTIfb5BfXb+zwUOuaYIBhBHJBZdmxzrayb2RrspRkC2+nNKXWqJFl4Y7s= X-Received: by 2002:a05:6808:dc7:b0:3c3:dddd:86b4 with SMTP id g7-20020a0568080dc700b003c3dddd86b4mr3083271oic.4.1711634470427; Thu, 28 Mar 2024 07:01:10 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> In-Reply-To: From: Nuno Teixeira Date: Thu, 28 Mar 2024 14:00:58 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Request for Testing: TCP RACK To: Drew Gallatin Cc: Konstantin Belousov , rrs , Mike Karels , tuexen , garyj@gmx.de, current@freebsd.org, net@freebsd.org, Randall Stewart Content-Type: multipart/alternative; boundary="000000000000986d170614b8f4a6" --000000000000986d170614b8f4a6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello all! Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)! Thanks all! Drew Gallatin escreveu (quinta, 21/03/2024 =C3=A0(s) 12:58): > The entire point is to *NOT* go through the overhead of scheduling > something asynchronously, but to take advantage of the fact that a > user/kernel transition is going to trash the cache anyway. > > In the common case of a system which has less than the threshold number > of connections , we access the tcp_hpts_softclock function pointer, make > one function call, and access hpts_that_need_softclock, and then return. > So that's 2 variables and a function call. > > I think it would be preferable to avoid that call, and to move the > declaration of tcp_hpts_softclock and hpts_that_need_softclock so that th= ey > are in the same cacheline. Then we'd be hitting just a single line in th= e > common case. (I've made comments on the review to that effect). > > Also, I wonder if the threshold could get higher by default, so that hpts > is never called in this context unless we're to the point where we're > scheduling thousands of runs of the hpts thread (and taking all those clo= ck > interrupts). > > Drew > > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > > On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > > Ok I have created > > > > https://reviews.freebsd.org/D44420 > > > > > > To address the issue. I also attach a short version of the patch that > Nuno > > can try and validate > > > > it works. Drew you may want to try this and validate the optimization > does > > kick in since I can > > > > only now test that it does not on my local box :) > The patch still causes access to all cpu's cachelines on each userret. > It would be much better to inc/check the threshold and only schedule the > call when exceeded. Then the call can occur in some dedicated context, > like per-CPU thread, instead of userret. > > > > > > > R > > > > > > > > On 3/18/24 3:42 PM, Drew Gallatin wrote: > > > No. The goal is to run on every return to userspace for every thread= . > > > > > > Drew > > > > > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > > > > > I got the idea from > > > > > > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > > > > > The gist is that the TCP pacing stuff needs to run frequently, an= d > > > > > rather than run it out of a clock interrupt, its more efficient t= o > run > > > > > it out of a system call context at just the point where we return > to > > > > > userspace and the cache is trashed anyway. The current > implementation > > > > > is fine for our workload, but probably not idea for a generic > system. > > > > > Especially one where something is banging on system calls. > > > > > > > > > > Ast's could be the right tool for this, but I'm super unfamiliar > with > > > > > them, and I can't find any docs on them. > > > > > > > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent > to > > > > > what's happening here? > > > > This call would need some AST number added, and then it registers t= he > > > > ast to run on next return to userspace, for the current thread. > > > > > > > > Is it enough? > > > > > > > > > > Drew > > > > > > > > > > > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > > > > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > > > > > > > > > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira > > > > wrote: > > > > > > > >> > > > > > > > >> Hello all! > > > > > > > >> > > > > > > > >> It works just fine! > > > > > > > >> System performance is OK. > > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > > > > > > >> > > > > > > > >> --- > > > > > > > >> net.inet.tcp.functions_available: > > > > > > > >> Stack D > > > > Alias PCB count > > > > > > > >> freebsd freebsd 0 > > > > > > > >> rack * > > > > rack 38 > > > > > > > >> --- > > > > > > > >> > > > > > > > >> It would be so nice that we can have a sysctl tunnable for > > > > this patch > > > > > > > >> so we could do more tests without recompiling kernel. > > > > > > > > Thanks for testing! > > > > > > > > > > > > > > > > @gallatin: can you come up with a patch that is acceptable > > > > for Netflix > > > > > > > > and allows to mitigate the performance regression. > > > > > > > > > > > > > > Ideally, tcphpts could enable this automatically when it > > > > starts to be > > > > > > > used (enough?), but a sysctl could select auto/on/off. > > > > > > There is already a well-known mechanism to request execution of > the > > > > > > specific function on return to userspace, namely AST. The > difference > > > > > > with the current hack is that the execution is requested for on= e > > > > callback > > > > > > in the context of the specific thread. > > > > > > > > > > > > Still, it might be worth a try to use it; what is the reason to > > > > hit a thread > > > > > > that does not do networking, with TCP processing? > > > > > > > > > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > Best regards > > > > > > > > Michael > > > > > > > >> > > > > > > > >> Thanks all! > > > > > > > >> Really happy here :) > > > > > > > >> > > > > > > > >> Cheers, > > > > > > > >> > > > > > > > >> Nuno Teixeira escreveu (domingo, > > > > 17/03/2024 =C3=A0(s) 20:26): > > > > > > > >>> > > > > > > > >>> Hello, > > > > > > > >>> > > > > > > > >>>> I don't have the full context, but it seems like the > > > > complaint is a performance regression in bonnie++ and perhaps other > > > > things when tcp_hpts is loaded, even when it is not used. Is that > > > > correct? > > > > > > > >>>> > > > > > > > >>>> If so, I suspect its because we drive the > > > > tcp_hpts_softclock() routine from userret(), in order to avoid tons > > > > of timer interrupts and context switches. To test this theory, yo= u > > > > could apply a patch like: > > > > > > > >>> > > > > > > > >>> It's affecting overall system performance, bonnie was jus= t > > > > a way to > > > > > > > >>> get some numbers to compare. > > > > > > > >>> > > > > > > > >>> Tomorrow I will test patch. > > > > > > > >>> > > > > > > > >>> Thanks! > > > > > > > >>> > > > > > > > >>> -- > > > > > > > >>> Nuno Teixeira > > > > > > > >>> FreeBSD Committer (ports) > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Nuno Teixeira > > > > > > > >> FreeBSD Committer (ports) > > > > > > > > > > > > > > > > > > > > > > > diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > > index 8c4d2d41a3eb..eadbee19f69c 100644 > > --- a/sys/netinet/tcp_hpts.c > > +++ b/sys/netinet/tcp_hpts.c > > @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > > void *ie_cookie; > > uint16_t p_num; /* The hpts number one per cpu */ > > uint16_t p_cpu; /* The hpts CPU */ > > + uint8_t hit_callout_thresh; > > /* There is extra space in here */ > > /* Cache line 0x100 */ > > struct callout co __aligned(CACHE_LINE_SIZE); > > @@ -269,6 +270,11 @@ static struct hpts_domain_info { > > int cpu[MAXCPU]; > > } hpts_domains[MAXMEMDOM]; > > > > +counter_u64_t hpts_that_need_softclock; > > +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock, > CTLFLAG_RD, > > + &hpts_that_need_softclock, > > + "Number of hpts threads that need softclock"); > > + > > counter_u64_t hpts_hopelessly_behind; > > > > SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, > CTLFLAG_RD, > > @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, precision, > CTLFLAG_RW, > > &tcp_hpts_precision, 120, > > "Value for PRE() precision of callout"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > > - &conn_cnt_thresh, 0, > > + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > > "How many connections (below) make us use the callout based > mechanism"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > > &hpts_does_tp_logging, 0, > > @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > > struct tcp_hpts_entry *hpts; > > int ticks_ran; > > > > + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0) > > + return; > > + > > hpts =3D tcp_choose_hpts_to_run(); > > > > if (hpts->p_hpts_active) { > > @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > > ticks_ran =3D tcp_hptsi(hpts, 1); > > tv.tv_sec =3D 0; > > tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > > + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 0)) { > > + hpts->hit_callout_thresh =3D 1; > > + counter_u64_add(hpts_that_need_softclock, 1); > > + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) && > (hpts->hit_callout_thresh =3D=3D 1)) { > > + hpts->hit_callout_thresh =3D 0; > > + counter_u64_add(hpts_that_need_softclock, -1); > > + } > > if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) { > > if(hpts->p_direct_wake =3D=3D 0) { > > /* > > @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > > cpu_top =3D NULL; > > #endif > > tcp_pace.rp_num_hptss =3D ncpus; > > + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK); > > hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); > > hpts_loops =3D counter_u64_alloc(M_WAITOK); > > back_tosleep =3D counter_u64_alloc(M_WAITOK); > > @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > > free(tcp_pace.grps, M_TCPHPTS); > > #endif > > > > + counter_u64_free(hpts_that_need_softclock); > > counter_u64_free(hpts_hopelessly_behind); > > counter_u64_free(hpts_loops); > > counter_u64_free(back_tosleep); > > > > --=20 Nuno Teixeira FreeBSD Committer (ports) --000000000000986d170614b8f4a6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello all!

Running rack @b7b= 78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)!

Thanks all!

Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/= 03/2024 =C3=A0(s) 12:58):
The entir= e point is to *NOT* go through the overhead of scheduling something asynchr= onously, but to take advantage of the fact that a user/kernel transition is= going to trash the cache anyway.

In the commo= n case of a system which has less than the threshold=C2=A0 number of connec= tions , we access the tcp_hpts_softclock function pointer, make one functio= n call, and access hpts_that_need_softclock, and then return.=C2=A0 So that= 's 2 variables and a function call.

I thin= k it would be preferable to avoid that call, and to move the declaration of= tcp_hpts_softclock and hpts_that_need_softclock so that they are in the sa= me cacheline.=C2=A0 Then we'd be hitting just a single line in the comm= on case.=C2=A0 (I've made comments on the review to that effect).

Also, I wonder if the threshold could get higher by= default, so that hpts is never called in this context unless we're to = the point where we're scheduling thousands of runs of the hpts thread (= and taking all those clock interrupts).

Drew

On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Be= lousov wrote:
On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> Ok I have created
>=C2=A0
>=C2=A0<= a href=3D"https://reviews.freebsd.org/D44420" target=3D"_blank">https://rev= iews.freebsd.org/D44420
>=C2=A0
>=C2= =A0
> To address the issue. I also attach a short version = of the patch that Nuno
> can try and validate
>=C2=A0
> it works. Drew you may want to try this and= validate the optimization does
> kick in since I can
<= /div>
>=C2=A0
> only now test that it does not on m= y local box :)
The patch still causes access to all cpu's= cachelines on each userret.
It would be much better to inc/c= heck the threshold and only schedule the
call when exceeded.= =C2=A0 Then the call can occur in some dedicated context,
lik= e per-CPU thread, instead of userret.

>=C2= =A0
>=C2=A0
> R
>=C2=A0<= br>
>=C2=A0
>=C2=A0
> On 3/1= 8/24 3:42 PM, Drew Gallatin wrote:
> > No.=C2=A0 The go= al is to run on every return to userspace for every thread.
&= gt; >=C2=A0
> > Drew
> >=C2=A0
> > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belouso= v wrote:
> > > On Mon, Mar 18, 2024 at 03:13:11PM -0= 400, Drew Gallatin wrote:
> > > > I got the idea = from
> > > >=C2=A0https= ://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> > > > The gist is that the TCP pacing stuff needs to= run frequently, and
> > > > rather than run it o= ut of a clock interrupt, its more efficient to run
> > = > > it out of a system call context at just the point where we return= to
> > > > userspace and the cache is trashed an= yway. The current implementation
> > > > is fine = for our workload, but probably not idea for a generic system.
> > > > Especially one where something is banging on system ca= lls.
> > > >
> > > > As= t's could be the right tool for this, but I'm super unfamiliar with=
> > > > them, and I can't find any docs on t= hem.
> > > >
> > > > Wo= uld ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> > > > what's happening here?
> >= ; > This call would need some AST number added, and then it registers th= e
> > > ast to run on next return to userspace, for = the current thread.
> > >=C2=A0
> &= gt; > Is it enough?
> > > >
>= > > > Drew
> > >=C2=A0
> = > > >
> > > > On Mon, Mar 18, 2024, at 2= :33 PM, Konstantin Belousov wrote:
> > > > > O= n Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
&= gt; > > > > > On 18 Mar 2024, at 7:04,=C2=A0tuexen@freebsd.org wrote:
> > > > > >
> > > > &g= t; > >> On 18. Mar 2024, at 12:42, Nuno Teixeira
>= ; > > <ed= uardo@freebsd.org> wrote:
> > > > > >= ; >>
> > > > > > >> Hello all!<= br>
> > > > > > >>
> >= ; > > > > >> It works just fine!
> > = > > > > >> System performance is OK.
> &= gt; > > > > >> Using patch on main-n268841-b0aaf8beb126(-= dirty).
> > > > > > >>
= > > > > > > >> ---
> > > >= ; > > >> net.inet.tcp.functions_available:
> &= gt; > > > > >> Stack=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 D
> > >= ; Alias=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 PCB count
> > > > > >= >> freebsd freebsd=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0
> > > > > > &= gt;> rack=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *
> > > rack=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 38
> > > > > > >> ---
> > > > > > >>
> > > >= > > >> It would be so nice that we can have a sysctl tunnable = for
> > > this patch
> > > &g= t; > > >> so we could do more tests without recompiling kernel.=
> > > > > > > Thanks for testing!
> > > > > > >
> > > &g= t; > > > @gallatin: can you come up with a patch that is acceptabl= e
> > > for Netflix
> > > >= ; > > > and allows to mitigate the performance regression.
> > > > > >
> > > > >= > Ideally, tcphpts could enable this automatically when it
> > > starts to be
> > > > > > u= sed (enough?), but a sysctl could select auto/on/off.
> &g= t; > > > There is already a well-known mechanism to request execut= ion of the
> > > > > specific function on retu= rn to userspace, namely AST.=C2=A0 The difference
> > &= gt; > > with the current hack is that the execution is requested for = one
> > > callback
> > > >= > in the context of the specific thread.
> > > &= gt; >
> > > > > Still, it might be worth a = try to use it; what is the reason to
> > > hit a thr= ead
> > > > > that does not do networking, wit= h TCP processing?
> > > > >
>= > > > > >
> > > > > > Mike<= br>
> > > > > >
> > > &g= t; > > > Best regards
> > > > > > = > Michael
> > > > > > >>
=
> > > > > > >> Thanks all!
> = > > > > > >> Really happy here :)
> &= gt; > > > > >>
> > > > > >= ; >> Cheers,
> > > > > > >>
=
> > > > > > >> Nuno Teixeira <eduardo@freebsd.org&g= t; escreveu (domingo,
> > > 17/03/2024 =C3=A0(s) 20:= 26):
> > > > > > >>>
> > > > > > >>> Hello,
> > = > > > > >>>
> > > > > >= ; >>>> I don't have the full context, but it seems like the=
> > > complaint is a performance regression in bonn= ie++ and perhaps other
> > > things when tcp_hpts is= loaded, even when it is not used.=C2=A0 Is that
> > &g= t; correct?
> > > > > > >>>>
> > > > > > >>>> If so, I suspect= its because we drive the
> > > tcp_hpts_softclock()= routine from userret(), in order to avoid tons
> > >= ; of timer interrupts and context switches.=C2=A0 To test this theory,=C2= =A0 you
> > > could apply a patch like:
> > > > > > >>>
> > > &= gt; > > >>> It's affecting overall system performance, b= onnie was just
> > > a way to
> >= ; > > > > >>> get some numbers to compare.
> > > > > > >>>
> > > = > > > >>> Tomorrow I will test patch.
> = > > > > > >>>
> > > > >= ; > >>> Thanks!
> > > > > > >= ;>>
> > > > > > >>> --
> > > > > > >>> Nuno Teixeira
=
> > > > > > >>> FreeBSD Committer (ports)
> > > > > > >>
> >= > > > > >>
> > > > > > &= gt;>
> > > > > > >> --
> > > > > > >> Nuno Teixeira
> = > > > > > >> FreeBSD Committer (ports)
&= gt; > > > > >
> > > > >
> > >=C2=A0
> >=C2=A0
> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts= .c
> index 8c4d2d41a3eb..eadbee19f69c 100644
> --- a/sys/netinet/tcp_hpts.c
> +++ b/sys/netinet/tcp= _hpts.c
> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {
<= /div>
>=C2=A0 void *ie_cookie;
>=C2=A0 uint16_t p= _num; /* The hpts number one per cpu */
>=C2=A0 uint16_t= p_cpu; /* The hpts CPU */
> + uint8_t hit_callout_thresh= ;
>=C2=A0 /* There is extra space in here */
>=C2=A0 /* Cache line 0x100 */
>=C2=A0 struct callo= ut co __aligned(CACHE_LINE_SIZE);
> @@ -269,6 +270,11 @@ s= tatic struct hpts_domain_info {
>=C2=A0 int cpu[MAXCPU];<= br>
>=C2=A0 } hpts_domains[MAXMEMDOM];
>=C2= =A0=C2=A0
> +counter_u64_t hpts_that_need_softclock;
> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needs= oftclock, CTLFLAG_RD,
> +=C2=A0=C2=A0=C2=A0 &hpts_that= _need_softclock,
> +=C2=A0=C2=A0=C2=A0 "Number of hpt= s threads that need softclock");
> +
&g= t;=C2=A0 counter_u64_t hpts_hopelessly_behind;
>=C2=A0=C2= =A0
>=C2=A0 SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, O= ID_AUTO, hopeless, CTLFLAG_RD,
> @@ -334,7 +340,7 @@ SYSCT= L_INT(_net_inet_tcp_hpts, OID_AUTO, precision, CTLFLAG_RW,
&g= t;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &tcp_hpts_precision, 120,
>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "Value for PRE() precision of cal= lout");
>=C2=A0 SYSCTL_INT(_net_inet_tcp_hpts, OID_AU= TO, cnt_thresh, CTLFLAG_RW,
> -=C2=A0=C2=A0=C2=A0 &con= n_cnt_thresh, 0,
> +=C2=A0=C2=A0=C2=A0 &conn_cnt_thres= h, DEFAULT_CONNECTION_THESHOLD,
>=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 "How many connections (below) make us use the callout based mec= hanism");
>=C2=A0 SYSCTL_INT(_net_inet_tcp_hpts, OID_= AUTO, logging, CTLFLAG_RW,
>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= &hpts_does_tp_logging, 0,
> @@ -1548,6 +1554,9 @@ __t= cp_run_hpts(void)
>=C2=A0 struct tcp_hpts_entry *hpts;
>=C2=A0 int ticks_ran;
>=C2=A0=C2=A0
> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0)
> + return;
> +
>=C2=A0 = hpts =3D tcp_choose_hpts_to_run();
>=C2=A0=C2=A0
>=C2=A0 if (hpts->p_hpts_active) {
> @@ -1683= ,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
>=C2=A0 ticks_r= an =3D tcp_hptsi(hpts, 1);
>=C2=A0 tv.tv_sec =3D 0;
>=C2=A0 tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS= _PER_SLOT;
> + if ((hpts->p_on_queue_cnt > conn_cnt_= thresh) && (hpts->hit_callout_thresh =3D=3D 0)) {
= > + hpts->hit_callout_thresh =3D 1;
> + counter_u6= 4_add(hpts_that_need_softclock, 1);
> + } else if ((hpts-&= gt;p_on_queue_cnt <=3D conn_cnt_thresh) && (hpts->hit_callout= _thresh =3D=3D 1)) {
> + hpts->hit_callout_thresh =3D = 0;
> + counter_u64_add(hpts_that_need_softclock, -1);
=
> + }
>=C2=A0 if (hpts->p_on_queue_cnt &= gt;=3D conn_cnt_thresh) {
>=C2=A0 if(hpts->p_direct_w= ake =3D=3D 0) {
>=C2=A0 /*
> @@ -1818,= 6 +1834,7 @@ tcp_hpts_mod_load(void)
>=C2=A0 cpu_top =3D = NULL;
>=C2=A0 #endif
>=C2=A0 tcp_pace.rp= _num_hptss =3D ncpus;
> + hpts_that_need_softclock =3D cou= nter_u64_alloc(M_WAITOK);
>=C2=A0 hpts_hopelessly_behind = =3D counter_u64_alloc(M_WAITOK);
>=C2=A0 hpts_loops =3D c= ounter_u64_alloc(M_WAITOK);
>=C2=A0 back_tosleep =3D coun= ter_u64_alloc(M_WAITOK);
> @@ -2042,6 +2059,7 @@ tcp_hpts_= mod_unload(void)
>=C2=A0 free(tcp_pace.grps, M_TCPHPTS);<= br>
>=C2=A0 #endif
>=C2=A0=C2=A0
> + counter_u64_free(hpts_that_need_softclock);
>=C2= =A0 counter_u64_free(hpts_hopelessly_behind);
>=C2=A0 co= unter_u64_free(hpts_loops);
>=C2=A0 counter_u64_free(back= _tosleep);





--
Nuno Teixeira
= FreeBSD Committer (ports)
--000000000000986d170614b8f4a6--