From nobody Thu Mar 21 12:57:44 2024 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4V0lqy484gz5DYjn; Thu, 21 Mar 2024 12:58:06 +0000 (UTC) (envelope-from gallatin@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4V0lqy14d6z4Pvg; Thu, 21 Mar 2024 12:58:06 +0000 (UTC) (envelope-from gallatin@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1711025886; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VDngckJsj6YwsGVPKf4TVi02lPK/AhReZS1mJGoCgQY=; b=SLdD4qCxkcNyxCM641BomckfhnCmHmOUf+Qu2DB2ox5hdqhn006Jg5xDp+OmAN6cf4B1vg DHfBECa/vScysiMHAk67BMmJbY7ru8En7Xmqz5bYPd/mJ20S/uJJyJNYKfCBCaIacNY56D VQbLsyqPYxhHZW5DWIVHx8+HD90w2FN5Ai9Xg/qRrfLLDBSP5j1t7vpWeOvQJTbRssMZje 5GYSyXxZc9Uudl9CWFqFPvz59+oaSfQMj9wVYSW0UzwEj6oXItzxQZBFSv90oTe7fTrnwq 5LOqAP82hTK13Tk6ErDrE/Akwi1HfWlMnYpZgtRpqfx+elIO9HpWcPqoBwCAug== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1711025886; a=rsa-sha256; cv=none; b=jXlRKPpla4YXlUMnXqETuVj9PTgL8DlGh4CZX3T4eTYDYFpruyw30kRynwTW8/1l1qZ6hK EefV0LDY87QJUk0sPITzUNvKI4QgfZHVnHbvHaR8NEpCW0jJwgpt4Q2vCTE+vXBns0KBQs dboTuZ5z6H8hxl59JVk3xSkTnctgjM6OlfnJ1lG/+yspZWd1cTkSN/sc6cXHUPjvoWdOPo 5yJScXooAMXP8Qn8qlvXRvzIj5fa0Cyd3ML0EyQx/5ZrGhl8sUYPG0a3RFVZOVkJs/PF2V rfsA2gVO/kv51UXX19UZYVP5MFQoK3AQInyPG8vCuzFl89PMFU4tshfP6Uqgzw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1711025886; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VDngckJsj6YwsGVPKf4TVi02lPK/AhReZS1mJGoCgQY=; b=m0bnOnL3qhWofFV4sboanFG6WwMDYk/7Oh3aJxcVj6qYWLKLwCFXZPs5kEbtbjdhv+TgLf dz1vJfRUjcu5cvzEyzfBr/5MdMLNGcV5hAlQmT2RNzgGmAYDyVE+2XkeBT59l7+pGpibce 1nEka2r0AqOY9KlZn49nvAB1oM6Qpjh/hwIL/+87PlxjEZ/IsvVW0jYMGlAOwEeJ/8Stsd vNm58qTR0ikCLfwQfCHlpxNNPM3e9dves0fM92Wfh3PIh25OmYB28PO0hUjNtMRBTctAcA vt5ujcjSpCw8YeIaqOUX4CUpJ4LMhzStafluAGS+Y/UGd8WhauZwHyojEvL9og== Received: from fauth2-smtp.messagingengine.com (fauth2-smtp.messagingengine.com [103.168.172.201]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: gallatin) by smtp.freebsd.org (Postfix) with ESMTPSA id 4V0lqx6pz5zjjD; Thu, 21 Mar 2024 12:58:05 +0000 (UTC) (envelope-from gallatin@freebsd.org) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailfauth.nyi.internal (Postfix) with ESMTP id 448621200066; Thu, 21 Mar 2024 08:58:05 -0400 (EDT) Received: from imap53 ([10.202.2.103]) by compute5.internal (MEProxy); Thu, 21 Mar 2024 08:58:05 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrleeigdegiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvvefutgesrgdtreerreerjeenucfhrhhomhepfdffrhgv ficuifgrlhhlrghtihhnfdcuoehgrghllhgrthhinhesfhhrvggvsghsugdrohhrgheqne cuggftrfgrthhtvghrnhepudehheeiffefudeuteeluddtgeeijedtffehjeeufeeiteei vdegvdeiiefgtddvnecuffhomhgrihhnpehfrhgvvggsshgurdhorhhgpdhmphhiqdhsfi hsrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mhepghgrlhhlrghtihhnodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqddufe efheelvddvudeiqddvleehtdegudekgedqghgrlhhlrghtihhnpeepfhhrvggvsghsugdr ohhrghesfhgrshhtmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i41414658:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id E0755364006F; Thu, 21 Mar 2024 08:58:04 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-332-gdeb4194079-fm-20240319.002-gdeb41940 List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Message-Id: In-Reply-To: References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> Date: Thu, 21 Mar 2024 08:57:44 -0400 From: "Drew Gallatin" To: "Konstantin Belousov" , rrs Cc: "Mike Karels" , tuexen , "Nuno Teixeira" , garyj@gmx.de, current@freebsd.org, net@freebsd.org, "Randall Stewart" Subject: Re: Request for Testing: TCP RACK Content-Type: multipart/alternative; boundary=3b39556bddae44fcbfa6d30004956a6c --3b39556bddae44fcbfa6d30004956a6c Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable The entire point is to *NOT* go through the overhead of scheduling somet= hing asynchronously, but to take advantage of the fact that a user/kerne= l transition is going to trash the cache anyway. In the common case of a system which has less than the threshold number= of connections , we access the tcp_hpts_softclock function pointer, mak= e one function call, and access hpts_that_need_softclock, and then retur= n. So that's 2 variables and a function call. I think it would be preferable to avoid that call, and to move the decla= ration of tcp_hpts_softclock and hpts_that_need_softclock so that they a= re in the same cacheline. Then we'd be hitting just a single line in th= e common case. (I've made comments on the review to that effect). Also, I wonder if the threshold could get higher by default, so that hpt= s is never called in this context unless we're to the point where we're = scheduling thousands of runs of the hpts thread (and taking all those cl= ock interrupts). Drew On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > > Ok I have created > >=20 > > https://reviews.freebsd.org/D44420 > >=20 > >=20 > > To address the issue. I also attach a short version of the patch tha= t Nuno > > can try and validate > >=20 > > it works. Drew you may want to try this and validate the optimizatio= n does > > kick in since I can > >=20 > > only now test that it does not on my local box :) > The patch still causes access to all cpu's cachelines on each userret. > It would be much better to inc/check the threshold and only schedule t= he > call when exceeded. Then the call can occur in some dedicated context, > like per-CPU thread, instead of userret. >=20 > >=20 > >=20 > > R > >=20 > >=20 > >=20 > > On 3/18/24 3:42 PM, Drew Gallatin wrote: > > > No. The goal is to run on every return to userspace for every thr= ead. > > >=20 > > > Drew > > >=20 > > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > > > > > I got the idea from > > > > > https://people.mpi-sws.org/~druschel/publications/soft-timers-= tocs.pdf > > > > > The gist is that the TCP pacing stuff needs to run frequently,= and > > > > > rather than run it out of a clock interrupt, its more efficien= t to run > > > > > it out of a system call context at just the point where we ret= urn to > > > > > userspace and the cache is trashed anyway. The current impleme= ntation > > > > > is fine for our workload, but probably not idea for a generic = system. > > > > > Especially one where something is banging on system calls. > > > > > > > > > > Ast's could be the right tool for this, but I'm super unfamili= ar with > > > > > them, and I can't find any docs on them. > > > > > > > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equival= ent to > > > > > what's happening here? > > > > This call would need some AST number added, and then it register= s the > > > > ast to run on next return to userspace, for the current thread. > > > >=20 > > > > Is it enough? > > > > > > > > > > Drew > > > >=20 > > > > > > > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > > > > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > > > > > > > > > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira > > > > wrote: > > > > > > > >> > > > > > > > >> Hello all! > > > > > > > >> > > > > > > > >> It works just fine! > > > > > > > >> System performance is OK. > > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > > > > > > >> > > > > > > > >> --- > > > > > > > >> net.inet.tcp.functions_available: > > > > > > > >> Stack D > > > > Alias PCB count > > > > > > > >> freebsd freebsd 0 > > > > > > > >> rack * > > > > rack 38 > > > > > > > >> --- > > > > > > > >> > > > > > > > >> It would be so nice that we can have a sysctl tunnable = for > > > > this patch > > > > > > > >> so we could do more tests without recompiling kernel. > > > > > > > > Thanks for testing! > > > > > > > > > > > > > > > > @gallatin: can you come up with a patch that is acceptab= le > > > > for Netflix > > > > > > > > and allows to mitigate the performance regression. > > > > > > > > > > > > > > Ideally, tcphpts could enable this automatically when it > > > > starts to be > > > > > > > used (enough?), but a sysctl could select auto/on/off. > > > > > > There is already a well-known mechanism to request execution= of the > > > > > > specific function on return to userspace, namely AST. The d= ifference > > > > > > with the current hack is that the execution is requested for= one > > > > callback > > > > > > in the context of the specific thread. > > > > > > > > > > > > Still, it might be worth a try to use it; what is the reason= to > > > > hit a thread > > > > > > that does not do networking, with TCP processing? > > > > > > > > > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > Best regards > > > > > > > > Michael > > > > > > > >> > > > > > > > >> Thanks all! > > > > > > > >> Really happy here :) > > > > > > > >> > > > > > > > >> Cheers, > > > > > > > >> > > > > > > > >> Nuno Teixeira escreveu (domingo, > > > > 17/03/2024 =C3=A0(s) 20:26): > > > > > > > >>> > > > > > > > >>> Hello, > > > > > > > >>> > > > > > > > >>>> I don't have the full context, but it seems like the > > > > complaint is a performance regression in bonnie++ and perhaps ot= her > > > > things when tcp_hpts is loaded, even when it is not used. Is th= at > > > > correct? > > > > > > > >>>> > > > > > > > >>>> If so, I suspect its because we drive the > > > > tcp_hpts_softclock() routine from userret(), in order to avoid t= ons > > > > of timer interrupts and context switches. To test this theory, = you > > > > could apply a patch like: > > > > > > > >>> > > > > > > > >>> It's affecting overall system performance, bonnie was = just > > > > a way to > > > > > > > >>> get some numbers to compare. > > > > > > > >>> > > > > > > > >>> Tomorrow I will test patch. > > > > > > > >>> > > > > > > > >>> Thanks! > > > > > > > >>> > > > > > > > >>> -- > > > > > > > >>> Nuno Teixeira > > > > > > > >>> FreeBSD Committer (ports) > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Nuno Teixeira > > > > > > > >> FreeBSD Committer (ports) > > > > > > > > > > > > > > > > >=20 > > >=20 >=20 > > diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > > index 8c4d2d41a3eb..eadbee19f69c 100644 > > --- a/sys/netinet/tcp_hpts.c > > +++ b/sys/netinet/tcp_hpts.c > > @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > > void *ie_cookie; > > uint16_t p_num; /* The hpts number one per cpu */ > > uint16_t p_cpu; /* The hpts CPU */ > > + uint8_t hit_callout_thresh; > > /* There is extra space in here */ > > /* Cache line 0x100 */ > > struct callout co __aligned(CACHE_LINE_SIZE); > > @@ -269,6 +270,11 @@ static struct hpts_domain_info { > > int cpu[MAXCPU]; > > } hpts_domains[MAXMEMDOM]; > > =20 > > +counter_u64_t hpts_that_need_softclock; > > +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftcloc= k, CTLFLAG_RD, > > + &hpts_that_need_softclock, > > + "Number of hpts threads that need softclock"); > > + > > counter_u64_t hpts_hopelessly_behind; > > =20 > > SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, CT= LFLAG_RD, > > @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, precisi= on, CTLFLAG_RW, > > &tcp_hpts_precision, 120, > > "Value for PRE() precision of callout"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > > - &conn_cnt_thresh, 0, > > + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > > "How many connections (below) make us use the callout based mec= hanism"); > > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > > &hpts_does_tp_logging, 0, > > @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > > struct tcp_hpts_entry *hpts; > > int ticks_ran; > > =20 > > + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0) > > + return; > > + > > hpts =3D tcp_choose_hpts_to_run(); > > =20 > > if (hpts->p_hpts_active) { > > @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > > ticks_ran =3D tcp_hptsi(hpts, 1); > > tv.tv_sec =3D 0; > > tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > > + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && (hpts->hit_callout= _thresh =3D=3D 0)) { > > + hpts->hit_callout_thresh =3D 1; > > + counter_u64_add(hpts_that_need_softclock, 1); > > + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) && (hpts->h= it_callout_thresh =3D=3D 1)) { > > + hpts->hit_callout_thresh =3D 0; > > + counter_u64_add(hpts_that_need_softclock, -1); > > + } > > if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) { > > if(hpts->p_direct_wake =3D=3D 0) { > > /* > > @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > > cpu_top =3D NULL; > > #endif > > tcp_pace.rp_num_hptss =3D ncpus; > > + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK); > > hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); > > hpts_loops =3D counter_u64_alloc(M_WAITOK); > > back_tosleep =3D counter_u64_alloc(M_WAITOK); > > @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > > free(tcp_pace.grps, M_TCPHPTS); > > #endif > > =20 > > + counter_u64_free(hpts_that_need_softclock); > > counter_u64_free(hpts_hopelessly_behind); > > counter_u64_free(hpts_loops); > > counter_u64_free(back_tosleep); >=20 >=20 --3b39556bddae44fcbfa6d30004956a6c Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable
The entire poin= t is to *NOT* go through the overhead of scheduling something asynchrono= usly, but to take advantage of the fact that a user/kernel transition is= going to trash the cache anyway.

In the co= mmon case of a system which has less than the threshold  number of = connections , we access the tcp_hpts_softclock function pointer, make on= e function call, and access hpts_that_need_softclock, and then return.&n= bsp; So that's 2 variables and a function call.

=
I think it would be preferable to avoid that call, and to move the = declaration of tcp_hpts_softclock and hpts_that_need_softclock so that t= hey are in the same cacheline.  Then we'd be hitting just a single = line in the common case.  (I've made comments on the review to that= effect).

Also, I wonder if the threshold c= ould get higher by default, so that hpts is never called in this context= unless we're to the point where we're scheduling thousands of runs of t= he hpts thread (and taking all those clock interrupts).
Drew

On Wed, Mar 20, 2024, at = 8:17 PM, Konstantin Belousov wrote:
On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs w= rote:
> Ok I have created
> To address the issue. I also attach a sho= rt version of the patch that Nuno
> can try and validat= e

> it works. Drew you may wan= t to try this and validate the optimization does
> kick= in since I can

> only now tes= t that it does not on my local box :)
The patch still caus= es access to all cpu's cachelines on each userret.
It woul= d be much better to inc/check the threshold and only schedule the
call when exceeded.  Then the call can occur in some dedica= ted context,
like per-CPU thread, instead of userret.
<= /div>



> R



> On 3/18/24 3:42 PM, Drew Gallatin wrote:=
> > No.  The goal is to run on every return to= userspace for every thread.
> > 
> > Drew
> > 
> > On= Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin = wrote:
> > > > I got the idea from
> > > > https://people.mpi-sws.org/~drusc= hel/publications/soft-timers-tocs.pdf
> > > &= gt; The gist is that the TCP pacing stuff needs to run frequently, and
> > > > rather than run it out of a clock inter= rupt, its more efficient to run
> > > > it out= of a system call context at just the point where we return to
=
> > > > userspace and the cache is trashed anyway. The = current implementation
> > > > is fine for our= workload, but probably not idea for a generic system.
>= ; > > > Especially one where something is banging on system cal= ls.
> > > >
> > > > = Ast's could be the right tool for this, but I'm super unfamiliar with
> > > > them, and I can't find any docs on them.=
> > > >
> > > > Wou= ld ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> > > > what's happening here?
> &g= t; > This call would need some AST number added, and then it register= s the
> > > ast to run on next return to userspac= e, for the current thread.
> > > 
<= div>> > > Is it enough?
> > > >
> > > > Drew
> > > 
=
> > > >
> > > > On Mon,= Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
>= > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Kare= ls wrote:
> > > > > > On 18 Mar 2024, at= 7:04, tuexen@freebsd.org= wrote:
> > > > > >
> &= gt; > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeir= a
> > > <eduardo@freebsd.org> wrote:
> > > > &= gt; > >>
> > > > > > >> H= ello all!
> > > > > > >>
=
> > > > > > >> It works just fine!
> > > > > > >> System performance is OK.
> > > > > > >> Using patch on main-= n268841-b0aaf8beb126(-dirty).
> > > > > >= ; >>
> > > > > > >> ---
<= /div>
> > > > > > >> net.inet.tcp.functions_= available:
> > > > > > >> Stack&nb= sp;           &nb= sp;           &nb= sp;  D
> > > Alias    &n= bsp;           &n= bsp;           PCB cou= nt
> > > > > > >> freebsd freebsd&= nbsp;           &= nbsp;           &= nbsp; 0
> > > > > > >> rack &= nbsp;           &= nbsp;           &= nbsp;  *
> > > rack    &= nbsp;           &= nbsp;            = 38
> > > > > > >> ---
> > > > > > >>
> > > &g= t; > > >> It would be so nice that we can have a sysctl tunn= able for
> > > this patch
> >= > > > > >> so we could do more tests without recompil= ing kernel.
> > > > > > > Thanks for = testing!
> > > > > > >
= > > > > > > > @gallatin: can you come up with a pat= ch that is acceptable
> > > for Netflix
=
> > > > > > > and allows to mitigate the perfo= rmance regression.
> > > > > >
=
> > > > > > Ideally, tcphpts could enable this au= tomatically when it
> > > starts to be
<= div>> > > > > > used (enough?), but a sysctl could sel= ect auto/on/off.
> > > > > There is already= a well-known mechanism to request execution of the
> &= gt; > > > specific function on return to userspace, namely AST.=   The difference
> > > > > with the cu= rrent hack is that the execution is requested for one
>= > > callback
> > > > > in the contex= t of the specific thread.
> > > > >
> > > > > Still, it might be worth a try to use it= ; what is the reason to
> > > hit a thread
> > > > > that does not do networking, with TCP p= rocessing?
> > > > >
> >= ; > > > >
> > > > > > Mike
> > > > > >
> > > = > > > > Best regards
> > > > > = > > Michael
> > > > > > >>
> > > > > > >> Thanks all!
> > > > > > >> Really happy here :)
> > > > > > >>
> > &= gt; > > > >> Cheers,
> > > > &g= t; > >>
> > > > > > >> Nu= no Teixeira <eduardo@freebsd.o= rg> escreveu (domingo,
> > > 17/03/2024 =C3= =A0(s) 20:26):
> > > > > > >>><= br>
> > > > > > >>> Hello,
> > > > > > >>>
> >= > > > > >>>> I don't have the full context, but= it seems like the
> > > complaint is a performan= ce regression in bonnie++ and perhaps other
> > >= things when tcp_hpts is loaded, even when it is not used.  Is that=
> > > correct?
> > > >= > > >>>>
> > > > > > = >>>> If so, I suspect its because we drive the
> > > tcp_hpts_softclock() routine from userret(), in order to= avoid tons
> > > of timer interrupts and context= switches.  To test this theory,  you
> > = > could apply a patch like:
> > > > > &g= t; >>>
> > > > > > >>>= It's affecting overall system performance, bonnie was just
> > > a way to
> > > > > > &g= t;>> get some numbers to compare.
> > > >= ; > > >>>
> > > > > > >= ;>> Tomorrow I will test patch.
> > > > = > > >>>
> > > > > > >&= gt;> Thanks!
> > > > > > >>>=
> > > > > > >>> --
> > > > > > >>> Nuno Teixeira
> > > > > > >>> FreeBSD Committer (ports)
> > > > > > >>
> &= gt; > > > > >>
> > > > > = > >>
> > > > > > >> --
> > > > > > >> Nuno Teixeira
> > > > > > >> FreeBSD Committer (ports)<= br>
> > > > > >
> > >= > >
> > > 
> >&nb= sp;

> diff --git a/sys/netinet/tcp_hpts.= c b/sys/netinet/tcp_hpts.c
> index 8c4d2d41a3eb..eadbee= 19f69c 100644
> --- a/sys/netinet/tcp_hpts.c
<= div>> +++ b/sys/netinet/tcp_hpts.c
> @@ -216,6 +216,= 7 @@ struct tcp_hpts_entry {
>  void *ie_cookie;<= br>
>  uint16_t p_num; /* The hpts number one per cp= u */
>  uint16_t p_cpu; /* The hpts CPU */
> + uint8_t hit_callout_thresh;
>  /*= There is extra space in here */
>  /* Cache line= 0x100 */
>  struct callout co __aligned(CACHE_LI= NE_SIZE);
> @@ -269,6 +270,11 @@ static struct hpts_dom= ain_info {
>  int cpu[MAXCPU];
>=   } hpts_domains[MAXMEMDOM];
>  
> +counter_u64_t hpts_that_need_softclock;
> = +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock, C= TLFLAG_RD,
> +    &hpts_that_need_so= ftclock,
> +    "Number of hpts threads = that need softclock");
> +
>  cou= nter_u64_t hpts_hopelessly_behind;
>  
>  SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO= , hopeless, CTLFLAG_RD,
> @@ -334,7 +340,7 @@ SYSCTL_IN= T(_net_inet_tcp_hpts, OID_AUTO, precision, CTLFLAG_RW,
>= ;      &tcp_hpts_precision, 120,
<= div>>      "Value for PRE() precision of cal= lout");
>  SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO= , cnt_thresh, CTLFLAG_RW,
> -    &co= nn_cnt_thresh, 0,
> +    &conn_cnt_t= hresh, DEFAULT_CONNECTION_THESHOLD,
>   =    "How many connections (below) make us use the callout based= mechanism");
>  SYSCTL_INT(_net_inet_tcp_hpts, OI= D_AUTO, logging, CTLFLAG_RW,
>    &= nbsp; &hpts_does_tp_logging, 0,
> @@ -1548,6 +1554,= 9 @@ __tcp_run_hpts(void)
>  struct tcp_hpts_entr= y *hpts;
>  int ticks_ran;
>&nbs= p; 
> + if (counter_u64_fetch(hpts_that_need_softc= lock) =3D=3D 0)
> + return;
> +
>  hpts =3D tcp_choose_hpts_to_run();
&g= t;  
>  if (hpts->p_hpts_active) {
> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
<= /div>
>  ticks_ran =3D tcp_hptsi(hpts, 1);
&g= t;  tv.tv_sec =3D 0;
>  tv.tv_usec =3D hpts= ->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT;
> + if ((= hpts->p_on_queue_cnt > conn_cnt_thresh) && (hpts->hit_c= allout_thresh =3D=3D 0)) {
> + hpts->hit_callout_th= resh =3D 1;
> + counter_u64_add(hpts_that_need_softclo= ck, 1);
> + } else if ((hpts->p_on_queue_cnt <=3D= conn_cnt_thresh) && (hpts->hit_callout_thresh =3D=3D 1)) {
> + hpts->hit_callout_thresh =3D 0;
&g= t; + counter_u64_add(hpts_that_need_softclock, -1);
> = + }
>  if (hpts->p_on_queue_cnt >=3D conn_c= nt_thresh) {
>  if(hpts->p_direct_wake =3D=3D= 0) {
>  /*
> @@ -1818,6 +1834= ,7 @@ tcp_hpts_mod_load(void)
>  cpu_top =3D NULL= ;
>  #endif
>  tcp_pace.rp_= num_hptss =3D ncpus;
> + hpts_that_need_softclock =3D c= ounter_u64_alloc(M_WAITOK);
>  hpts_hopelessly_be= hind =3D counter_u64_alloc(M_WAITOK);
>  hpts_loo= ps =3D counter_u64_alloc(M_WAITOK);
>  back_tosle= ep =3D counter_u64_alloc(M_WAITOK);
> @@ -2042,6 +2059,= 7 @@ tcp_hpts_mod_unload(void)
>  free(tcp_pace.g= rps, M_TCPHPTS);
>  #endif
> = ; 
> + counter_u64_free(hpts_that_need_softclock);=
>  counter_u64_free(hpts_hopelessly_behind);
=
>  counter_u64_free(hpts_loops);
>&= nbsp; counter_u64_free(back_tosleep);


=

--3b39556bddae44fcbfa6d30004956a6c--