From nobody Mon Mar 7 16:25:22 2022 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D993819FF2FB; Mon, 7 Mar 2022 16:26:06 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from fry.fubar.geek.nz (fry.fubar.geek.nz [139.59.165.16]) by mx1.freebsd.org (Postfix) with ESMTP id 4KC3jk4kpCz4Zg5; Mon, 7 Mar 2022 16:26:01 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from smtpclient.apple (cpc91232-cmbg18-2-0-cust554.5-4.cable.virginm.net [82.2.126.43]) by fry.fubar.geek.nz (Postfix) with ESMTPSA id E00234E691; Mon, 7 Mar 2022 16:25:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fubar.geek.nz; s=mail; t=1646670325; bh=7XrO+HaAx4iY3x3pFrVg45yl1uV5r+TSyOoKSvgfHkk=; h=From:Subject:Date:In-Reply-To:Cc:To:References; b=RFijEnsVrMOCC4A0q561JG7eVkbahX00Nqx9p1mxa9VBezirmPks/x3mpCbmygZH0 eB59BuSrztQHKsdU6oqMqmy4v3PzOMaoGDAV9IWfa5tVx0gZXdKl0Tgd6IgSPKGqsf AFNOpRvecWe8+7LC7K4M2llmCqxw9OJsGO5yk6XIT4TP0rjhNytpbTHU8jddLQOCWF 8i8PrFIfIeyJVg805dB5x5jjavd2geqJSCQW7ILbAQzB1C2woJ4BEOISGyN5KwlTxf ALvOlw1CjF/WER8xwPs8hD2Ug4iCx/PmBvdOfWm8DRERtAj2/EphDWKz3whGjoH3vr L9lpsWNC3sH+ayr1Y0Db95EhZTaDKUaQecGBLE8JqS0Y7/9jAnEgwfmMs5sdoWxe1t fpCeSUyxKUOzp/yQYLlbXKBft02yBX6bOiPGt+BAQ3fvWUE4qZiitR1YFOuyDToq1b qh3ZS0VsKtAuvgM97RqVusGT4w6Kaxl4xoyT1gcH8AAIItVO+COVEG+c5PoV/uoNux EAXypYPS4Fy8ltgItPfmKoN7qSxUyL5OkAyse61e3MbfxZf3LcZ+ju1Fh+7f9B2sRM NGQhbDHDaLsfdT3FVu2zu/Imr7pUXXpFBFCwxvmCNOZtpvTehUAyLDIpKGMe4/MtKy u7AE3Q2svHWOHNodoDZAgckU= From: Andrew Turner Message-Id: <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> Content-Type: multipart/alternative; boundary="Apple-Mail=_26F7CFA2-6340-472E-A50C-7E288B1E046B" List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\)) Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Date: Mon, 7 Mar 2022 16:25:22 +0000 In-Reply-To: Cc: Ronald Klop , bob prohaska , Mark Millard , freebsd-arm@freebsd.org, freebsd-current To: Mark Johnston References: <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> X-Mailer: Apple Mail (2.3693.60.0.1.1) X-Rspamd-Queue-Id: 4KC3jk4kpCz4Zg5 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=fubar.geek.nz header.s=mail header.b=RFijEnsV; dmarc=pass (policy=none) header.from=fubar.geek.nz; spf=pass (mx1.freebsd.org: domain of andrew@fubar.geek.nz designates 139.59.165.16 as permitted sender) smtp.mailfrom=andrew@fubar.geek.nz X-Spamd-Result: default: False [-3.40 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[fubar.geek.nz:s=mail]; FREEFALL_USER(0.00)[andrew]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_FIVE(0.00)[6]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[fubar.geek.nz:+]; DMARC_POLICY_ALLOW(-0.50)[fubar.geek.nz,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-arm,freebsd-current]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:14061, ipnet:139.59.160.0/20, country:US]; FREEMAIL_CC(0.00)[klop.ws,www.zefox.net,yahoo.com,freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N Status: O Content-Length: 17129 Lines: 287 --Apple-Mail=_26F7CFA2-6340-472E-A50C-7E288B1E046B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On 7 Mar 2022, at 15:13, Mark Johnston wrote: > ... > A (the?) problem is that the compiler is treating "pc" as an alias > for x18, but the rmlock code assumes that the pcpu pointer is loaded > once, as it dereferences "pc" outside of the critical section. On > arm64, if a context switch occurs between the store at _rm_rlock+144 = and > the load at +152, and the thread is migrated to another CPU, then = we'll > end up using the wrong CPU ID in the rm->rm_writecpus test. >=20 > I suspect the problem is unique to arm64 as its get_pcpu() > implementation is different from the others in that it doesn't use > volatile-qualified inline assembly. This has been the case since > = https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbddf85ad04fc8e= 99e73762 = > . >=20 > I haven't been able to reproduce any crashes running poudriere in an > arm64 AWS instance, though. Could you please try the patch below and > confirm whether it fixes your panics? I verified that the apparent > problem described above is gone with the patch. Alternatively (or additionally) we could do something like the = following. There are only a few MI users of get_pcpu with the main place = being in rm locks. diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h index 09f6361c651c..59b890e5c2ea 100644 --- a/sys/arm64/include/pcpu.h +++ b/sys/arm64/include/pcpu.h @@ -58,7 +58,14 @@ struct pcpu; register struct pcpu *pcpup __asm ("x18"); -#define get_pcpu() pcpup +static inline struct pcpu * +get_pcpu(void) +{ + struct pcpu *pcpu; + + __asm __volatile("mov %0, x18" : "=3D&r"(pcpu)); + return (pcpu); +} static inline struct thread * get_curthread(void) Andrew --Apple-Mail=_26F7CFA2-6340-472E-A50C-7E288B1E046B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
On = 7 Mar 2022, at 15:13, Mark Johnston <markj@freebsd.org> = wrote:
...
A (the?) problem is that the compiler is treating "pc" as an = alias
for x18, but = the rmlock code assumes that the pcpu pointer is loaded
once, as it dereferences "pc" = outside of the critical section.  On
arm64, if a context switch occurs between the store at = _rm_rlock+144 and
the load at +152, and the thread is migrated to another CPU, = then we'll
end up using = the wrong CPU ID in the rm->rm_writecpus test.

I suspect the problem is unique = to arm64 as its get_pcpu()
implementation is different from the others in that it = doesn't use
volatile-qualified inline assembly.  This has been the = case since
https://cgit.freebsd.org/src/commit/?id=3D63c858a04d56529eddbdd= f85ad04fc8e99e73762
.

I haven't = been able to reproduce any crashes running poudriere in an
arm64 AWS instance, though. =  Could you please try the patch below and
confirm whether it fixes your = panics?  I verified that the apparent
problem described above is gone with the patch.

Alternatively= (or additionally) we could do something like the following. There are = only a few MI users of get_pcpu with the main place being in rm = locks.

diff --git = a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h
index = 09f6361c651c..59b890e5c2ea 100644
--- = a/sys/arm64/include/pcpu.h
+++ = b/sys/arm64/include/pcpu.h
@@ -58,7 +58,14 @@ struct = pcpu;

 register struct pcpu = *pcpup __asm ("x18");

-#define =        get_pcpu()     =  pcpup
+static inline struct pcpu = *
+get_pcpu(void)
+{
+     =   struct pcpu *pcpu;
+
+       = __asm __volatile("mov   %0, x18" : "=3D&r"(pcpu));
+ =       return (pcpu);
+}

 static inline struct thread = *
 get_curthread(void)

Andrew

= --Apple-Mail=_26F7CFA2-6340-472E-A50C-7E288B1E046B--