From nobody Wed Jun 1 14:16:46 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 198661B49CEB for ; Wed, 1 Jun 2022 14:16:57 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LCrn42dB4z4sxD for ; Wed, 1 Jun 2022 14:16:56 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x736.google.com with SMTP id b200so1380028qkc.7 for ; Wed, 01 Jun 2022 07:16:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Ed3h7g+uRUwC/nyBa5EIQITDA6WFIsgl3vp1oPJKskI=; b=FLgfqQDHpUeNWEf4B+P0USIpHrxZbg9TFiHYUclONixi1VAS9iVdYJSzBPmdwjGx1q SVx50o0tE2j+M0UVFFfeY4cva7s+Spbr2MIB7KgidoBTPIyM+P8zKxe9vWVwVvzfMv0c uguLxPvN4frLr7ad4Ugr1mHVYo+/ufr1c1Or1bRoS7+LJrZtbbnwQ39cIYqUWVcWDr5Z PpIU9UJlaH0t2io4212yR1RkYTir8ntDmi3NLnzqGRiBjC/Ul/SGcrYpcEWyyC7mr7tX YU12DFZMhbQzAiRO0BeiPJ8IaR+mcpewPdexC4IjS30qkQ90f48mpi32iLNm3QUsgfn1 lnqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=Ed3h7g+uRUwC/nyBa5EIQITDA6WFIsgl3vp1oPJKskI=; b=XdZ027HNoBdpb+l8EFhJa9xiHP2TUmRgP/FtEB5ns1jMxZE9Fg5gdO2wmjIZYj24n7 pcB7m1nGohYGMGaF7QDMzELHOOqsFlDXO9Ta3NOqADkvwM6c9xz0Fk28a48snWS452I2 ExUFL/ziE7IAcgO2E6sB/tifnjZ485ntKlgv1prFOuTRQftDRymGoE2Bp4PWpqDPCXJA E1F9WXnOAlPJMpgyfogRjcx1a/wccj3z2peO898JaBRF8BTBvtsFAB3vrL2ke2ppF3uG yAllD8PwW1p5iL2TVJk6dwyCMvtzG7zX8kEmaeRGqx6olhii+SOtEzyQ7xud51k5IVst VCHw== X-Gm-Message-State: AOAM532ce7xTKlM7DiOfa7VXgCy0WHfJ3iR8VHK9zuLFyOiPsDkQHdGc mTiZWfLYr5KrHYKkeIFAZNL4juLrcqA= X-Google-Smtp-Source: ABdhPJyavqvlrV5+ozChssB7asmY7QuZMKNueXjZUeplW/FgOJiJ/4E/5VStNNcd2s+bdlxou9NZcA== X-Received: by 2002:a05:620a:44c6:b0:6a6:49cb:4633 with SMTP id y6-20020a05620a44c600b006a649cb4633mr195584qkp.370.1654093009770; Wed, 01 Jun 2022 07:16:49 -0700 (PDT) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id l2-20020a37bb02000000b006a2e2dde144sm1263097qkf.88.2022.06.01.07.16.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 07:16:48 -0700 (PDT) Date: Wed, 1 Jun 2022 10:16:46 -0400 From: Mark Johnston To: Paul Floyd Cc: FreeBSD Hackers Subject: Re: Hang ast / pipelk / piperd Message-ID: References: <84015bf9-8504-1c3c-0ba5-58d0d7824843@gmail.com> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4LCrn42dB4z4sxD X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=FLgfqQDH; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::736 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-0.82 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.88)[0.878]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::736:from]; MLMMJ_DEST(0.00)[freebsd-hackers]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, May 30, 2022 at 09:35:05PM +0200, Paul Floyd wrote: > > > On 5/30/22 14:15, Mark Johnston wrote: > > > "procstat -kk " might help to reveal what's going on, > > since it sounds like the hand/livelock is happening somewhere in the > > kernel. > > Not knowing much about the kernel, my guess is that this is related to > > commit 4808bab7fa6c3ec49b49476b8326d7a0250a03fa > Author: Alexander Motin > Date: Tue Sep 21 18:14:22 2021 -0400 > > sched_ule(4): Improve long-term load balancer. > > and this bit of ast code > > doreti_ast: > /* > * Check for ASTs atomically with returning. Disabling CPU > * interrupts provides sufficient locking even in the SMP case, > * since we will be informed of any new ASTs by an IPI. > */ > cli > movq PCPU(CURTHREAD),%rax > testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax) > je doreti_exit > sti > movq %rsp,%rdi /* pass a pointer to the trapframe */ > call ast > jmp doreti_ast > > > The above commit seems to be migrating loaded threads to another CPU. How did you infer that? The long-term load balancer should be running fairly infrequently. As a side note, I think we are missing ktrcsw() calls in some places, e.g., in turnstile_wait(). > My test system is a VirtualBox amd64 FreeBSD 13.1 with one CPU running > on a 13.0 host. > > I just tried restarting the VM with 2 CPUs and the testcase seems to be > a lot better - it's been running in a loop for 10 minutes whereas > previously it would hang at least 1 in 5 times. Hmm. Could you, please, show the ktrace output with -H -T passed to kdump(1), together with fresh "procstat -kk" output? The fact that the problem apparently only occurs with 1 CPU suggests a scheduler bug, indeed.