From nobody Thu Mar 21 00:17:37 2024 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4V0Qyk3njdz5FK8q for ; Thu, 21 Mar 2024 00:17:50 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4V0Qyj4fgNz4vSD; Thu, 21 Mar 2024 00:17:49 +0000 (UTC) (envelope-from kib@freebsd.org) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=freebsd.org (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kib@freebsd.org) smtp.mailfrom=kib@freebsd.org Received: from tom.home (kib@localhost [127.0.0.1] (may be forged)) by kib.kiev.ua (8.18.1/8.18.1) with ESMTP id 42L0HcYZ082994; Thu, 21 Mar 2024 02:17:41 +0200 (EET) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 42L0HcYZ082994 Received: (from kostik@localhost) by tom.home (8.18.1/8.18.1/Submit) id 42L0Hb4s082987; Thu, 21 Mar 2024 02:17:37 +0200 (EET) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Thu, 21 Mar 2024 02:17:37 +0200 From: Konstantin Belousov To: rrs Cc: Drew Gallatin , Mike Karels , tuexen , Nuno Teixeira , garyj@gmx.de, current@freebsd.org, net@freebsd.org, Randall Stewart Subject: Re: Request for Testing: TCP RACK Message-ID: References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on tom.home X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.97 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.965]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : No valid SPF, No valid DKIM,none]; MIME_GOOD(-0.10)[text/plain]; R_SPF_SOFTFAIL(0.00)[~all]; HAS_XAW(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[kib]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_SEVEN(0.00)[9]; MLMMJ_DEST(0.00)[current@freebsd.org]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_CC(0.00)[freebsd.org,karels.net,gmx.de]; R_DKIM_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MISSING_XM_UA(0.00)[] X-Rspamd-Queue-Id: 4V0Qyj4fgNz4vSD On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > Ok I have created > > https://reviews.freebsd.org/D44420 > > > To address the issue. I also attach a short version of the patch that Nuno > can try and validate > > it works. Drew you may want to try this and validate the optimization does > kick in since I can > > only now test that it does not on my local box :) The patch still causes access to all cpu's cachelines on each userret. It would be much better to inc/check the threshold and only schedule the call when exceeded. Then the call can occur in some dedicated context, like per-CPU thread, instead of userret. > > > R > > > > On 3/18/24 3:42 PM, Drew Gallatin wrote: > > No.  The goal is to run on every return to userspace for every thread. > > > > Drew > > > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > > > > I got the idea from > > > > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > > > > The gist is that the TCP pacing stuff needs to run frequently, and > > > > rather than run it out of a clock interrupt, its more efficient to run > > > > it out of a system call context at just the point where we return to > > > > userspace and the cache is trashed anyway. The current implementation > > > > is fine for our workload, but probably not idea for a generic system. > > > > Especially one where something is banging on system calls. > > > > > > > > Ast's could be the right tool for this, but I'm super unfamiliar with > > > > them, and I can't find any docs on them. > > > > > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to > > > > what's happening here? > > > This call would need some AST number added, and then it registers the > > > ast to run on next return to userspace, for the current thread. > > > > > > Is it enough? > > > > > > > > Drew > > > > > > > > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > > > > > > On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > > > > > > > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira > > > wrote: > > > > > > >> > > > > > > >> Hello all! > > > > > > >> > > > > > > >> It works just fine! > > > > > > >> System performance is OK. > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty). > > > > > > >> > > > > > > >> --- > > > > > > >> net.inet.tcp.functions_available: > > > > > > >> Stack                           D > > > Alias                            PCB count > > > > > > >> freebsd freebsd                          0 > > > > > > >> rack                            * > > > rack                             38 > > > > > > >> --- > > > > > > >> > > > > > > >> It would be so nice that we can have a sysctl tunnable for > > > this patch > > > > > > >> so we could do more tests without recompiling kernel. > > > > > > > Thanks for testing! > > > > > > > > > > > > > > @gallatin: can you come up with a patch that is acceptable > > > for Netflix > > > > > > > and allows to mitigate the performance regression. > > > > > > > > > > > > Ideally, tcphpts could enable this automatically when it > > > starts to be > > > > > > used (enough?), but a sysctl could select auto/on/off. > > > > > There is already a well-known mechanism to request execution of the > > > > > specific function on return to userspace, namely AST.  The difference > > > > > with the current hack is that the execution is requested for one > > > callback > > > > > in the context of the specific thread. > > > > > > > > > > Still, it might be worth a try to use it; what is the reason to > > > hit a thread > > > > > that does not do networking, with TCP processing? > > > > > > > > > > > > > > > > > Mike > > > > > > > > > > > > > Best regards > > > > > > > Michael > > > > > > >> > > > > > > >> Thanks all! > > > > > > >> Really happy here :) > > > > > > >> > > > > > > >> Cheers, > > > > > > >> > > > > > > >> Nuno Teixeira escreveu (domingo, > > > 17/03/2024 à(s) 20:26): > > > > > > >>> > > > > > > >>> Hello, > > > > > > >>> > > > > > > >>>> I don't have the full context, but it seems like the > > > complaint is a performance regression in bonnie++ and perhaps other > > > things when tcp_hpts is loaded, even when it is not used.  Is that > > > correct? > > > > > > >>>> > > > > > > >>>> If so, I suspect its because we drive the > > > tcp_hpts_softclock() routine from userret(), in order to avoid tons > > > of timer interrupts and context switches.  To test this theory,  you > > > could apply a patch like: > > > > > > >>> > > > > > > >>> It's affecting overall system performance, bonnie was just > > > a way to > > > > > > >>> get some numbers to compare. > > > > > > >>> > > > > > > >>> Tomorrow I will test patch. > > > > > > >>> > > > > > > >>> Thanks! > > > > > > >>> > > > > > > >>> -- > > > > > > >>> Nuno Teixeira > > > > > > >>> FreeBSD Committer (ports) > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > >> Nuno Teixeira > > > > > > >> FreeBSD Committer (ports) > > > > > > > > > > > > > > > > > diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > index 8c4d2d41a3eb..eadbee19f69c 100644 > --- a/sys/netinet/tcp_hpts.c > +++ b/sys/netinet/tcp_hpts.c > @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > void *ie_cookie; > uint16_t p_num; /* The hpts number one per cpu */ > uint16_t p_cpu; /* The hpts CPU */ > + uint8_t hit_callout_thresh; > /* There is extra space in here */ > /* Cache line 0x100 */ > struct callout co __aligned(CACHE_LINE_SIZE); > @@ -269,6 +270,11 @@ static struct hpts_domain_info { > int cpu[MAXCPU]; > } hpts_domains[MAXMEMDOM]; > > +counter_u64_t hpts_that_need_softclock; > +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock, CTLFLAG_RD, > + &hpts_that_need_softclock, > + "Number of hpts threads that need softclock"); > + > counter_u64_t hpts_hopelessly_behind; > > SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, CTLFLAG_RD, > @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, precision, CTLFLAG_RW, > &tcp_hpts_precision, 120, > "Value for PRE() precision of callout"); > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > - &conn_cnt_thresh, 0, > + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > "How many connections (below) make us use the callout based mechanism"); > SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > &hpts_does_tp_logging, 0, > @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > struct tcp_hpts_entry *hpts; > int ticks_ran; > > + if (counter_u64_fetch(hpts_that_need_softclock) == 0) > + return; > + > hpts = tcp_choose_hpts_to_run(); > > if (hpts->p_hpts_active) { > @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > ticks_ran = tcp_hptsi(hpts, 1); > tv.tv_sec = 0; > tv.tv_usec = hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && (hpts->hit_callout_thresh == 0)) { > + hpts->hit_callout_thresh = 1; > + counter_u64_add(hpts_that_need_softclock, 1); > + } else if ((hpts->p_on_queue_cnt <= conn_cnt_thresh) && (hpts->hit_callout_thresh == 1)) { > + hpts->hit_callout_thresh = 0; > + counter_u64_add(hpts_that_need_softclock, -1); > + } > if (hpts->p_on_queue_cnt >= conn_cnt_thresh) { > if(hpts->p_direct_wake == 0) { > /* > @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > cpu_top = NULL; > #endif > tcp_pace.rp_num_hptss = ncpus; > + hpts_that_need_softclock = counter_u64_alloc(M_WAITOK); > hpts_hopelessly_behind = counter_u64_alloc(M_WAITOK); > hpts_loops = counter_u64_alloc(M_WAITOK); > back_tosleep = counter_u64_alloc(M_WAITOK); > @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > free(tcp_pace.grps, M_TCPHPTS); > #endif > > + counter_u64_free(hpts_that_need_softclock); > counter_u64_free(hpts_hopelessly_behind); > counter_u64_free(hpts_loops); > counter_u64_free(back_tosleep);