Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Sep 2022 13:17:48 +0000
From:      Julio Merino <julio@meroh.net>
To:        Justin Hibbits <jhibbits@FreeBSD.org>
Cc:        "freebsd-ppc@freebsd.org" <freebsd-ppc@freebsd.org>
Subject:   RE: PowerMac G5 crashes with "instruction storage interrupt" on recent 13
Message-ID:  <PH0PR20MB3704500C677E13DCC9C69541C0479@PH0PR20MB3704.namprd20.prod.outlook.com>
In-Reply-To: <PH0PR20MB370485AF1ACF74A8D9FBCC6DC0429@PH0PR20MB3704.namprd20.prod.outlook.com>
References:  <PH0PR20MB3704882DD6DC53BB1CF2F5D2C09B9@PH0PR20MB3704.namprd20.prod.outlook.com> <PH0PR20MB37041E9776E86D61EB63FEBFC0439@PH0PR20MB3704.namprd20.prod.outlook.com> <20220909120857.61f65069@ralga-linux> <PH0PR20MB37043177835C8DD8B024A173C0439@PH0PR20MB3704.namprd20.prod.outlook.com> <20220909151238.5da8b63a@ralga-linux> <PH0PR20MB370485AF1ACF74A8D9FBCC6DC0429@PH0PR20MB3704.namprd20.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Alright, did some more bisecting and reached this range of commits where th=
e problem with the fans starts:

g5:/usr/src> git log --oneline f639aeb3fd3e..6f387a563206 sys
6f387a563206 vm_reserv: #include vm_extern.h explicitly, for arm.
bf27b9bc7f5b vm_phys: convert error back to warning
87e6f3d27eba vm_phys: #include vm_extern
c5a5a9dbcf38 vm_extern: use standard address checkers everywhere
f8da86347070 linux(4): Implement __vdso_time
00c933e9254c linux(4): Use saved cpu feature bits

I think we can safely discard the linux(4) commits. Other than that, the bu=
ild seems broken at each intermediate vm_* step so it=92s hard now to pinpo=
int any of those specifically.

Does this ring a bell?

Thanks

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=3D550986>; for Window=
s

From: Julio Merino<mailto:julio@meroh.net>
Sent: Friday, September 9, 2022 18:41
To: Justin Hibbits<mailto:jhibbits@FreeBSD.org>
Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>
Subject: RE: PowerMac G5 crashes with "instruction storage interrupt" on re=
cent 13

I have now tried to compare the dmesgs and sysctl of a good kernel (built a=
t 9171b8068b92 with the workaround applied) and a recent bad kernel with th=
e workaround applied as well.

The main differences comparing dmesg output, where the dash prefix is for t=
he good kernel and the plus prefix is for the bad kernel:

-----
-bus_dmamem_alloc failed to align memory properly.

-firewire0: 2 nodes, maxhop <=3D 1 cable IRM irm(1)  (me)
+firewire0: 2 nodes, maxhop <=3D 1 Not IRM capable irm(-1)

+pci1:5:4:0: VPD data does not start with ident (0x8)
+pci1:5:4:0: failed to read VPD data.
+pci1:5:4:0: no valid vpd ident found
+pci1:5:4:1: VPD data does not start with ident (0x8)
+pci1:5:4:1: failed to read VPD data.
+pci1:5:4:1: no valid vpd ident found

+WARNING: Current temperature (CPU A0 DIODE TEMP: 916.0 C) exceeds critical=
 temperature (90.0 C); count=3D1
-----

Note here that the temperature measured seems obviously wrong once the fans=
 spin up like crazy. And soon after this, count grows too high and the mach=
ine shuts down by itself.

Looking at differences for all sysctls that mention =93temp=94:

-----
dev.ds1631.0.%pnpinfo: name=3Dtemp-monitor compat=3Dds1631
-dev.ds1631.0.sensor.mlb_inlet_amb.temp: 27.5C
+dev.ds1631.0.sensor.mlb_inlet_amb.temp: 29.6C
dev.ds1775.0.%pnpinfo: name=3Dtemp-monitor compat=3Dds1775
-dev.ds1775.0.sensor.drive_bay.temp: 26.5C
+dev.ds1775.0.sensor.drive_bay.temp: 29.5C
dev.max6690.0.%pnpinfo: name=3Dtemp-monitor compat=3Dmax6690
-dev.max6690.0.sensor.backside.temp: 36.1C
-dev.max6690.0.sensor.kodiak_diode.temp: 48.7C
+dev.max6690.0.sensor.backside.temp: 42.2C
+dev.max6690.0.sensor.kodiak_diode.temp: 55.2C
dev.max6690.1.%pnpinfo: name=3Dtemp-monitor compat=3Dmax6690
-dev.max6690.1.sensor.tunnel.temp: 31.2C
-dev.max6690.1.sensor.tunnel_heatsink.temp: 33.7C
+dev.max6690.1.sensor.tunnel.temp: 34.7C
+dev.max6690.1.sensor.tunnel_heatsink.temp: 39.0C
-dev.smusat.0.cpu_a0_diode_temp: 34.2C
-dev.smusat.0.cpu_a1_diode_temp: 35.0C
kstat.zfs.misc.arcstats.arc_tempreserve: 0
-----

The fact that dev.smusat.* is gone from the =93bad=94 kernel seems suspicio=
us, but smusat0 is detected properly in both kernels according to dmesg=85

Any thoughts? I can try to bisect this as well, but there are 1500+ changes=
 to sort through so this will take a while.

Thanks!


From: Justin Hibbits<mailto:jhibbits@FreeBSD.org>
Sent: Friday, September 9, 2022 12:12
To: Julio Merino<mailto:julio@meroh.net>
Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>
Subject: Re: PowerMac G5 crashes with "instruction storage interrupt" on re=
cent 13

That seems bizarre.  There haven't been any changes to the controller
thread (powermac_thermal.c) in more than 7 years.  Are there any
problems with sensors?  I tested the change I made back in 2015 on my
dual core G5, with the intent that it would ramp the fans up sooner
(non-linear), and back them down with hysteresis.  So when there's load
that raises the temperature significantly it will ramp the fans up as
quickly as it can, hitting 100% fan long before it can reach maximum
temperature.

- Justin

On Fri, 9 Sep 2022 19:01:06 +0000
Julio Merino <julio@meroh.net> wrote:

> Ah, thanks for the workaround. I applied it on top of 9171b8068b92
> and the kernel was able to boot successfully =96 and it seems stable so
> far.
>
> However, if I apply the hack on top of stable/13=92s HEAD, there is
> still the issue of the fans going crazy at the slightest increase in
> CPU load but they do drop back down to quiet when the load subsumes.
> (For example, a simple =93git log=94 in /usr/src makes the fan spin up
> within a couple of seconds and they stop soon after that.) Any ideas
> on where this might come from?
>
>
> From: Justin Hibbits<mailto:jhibbits@FreeBSD.org>
> Sent: Friday, September 9, 2022 09:09
> To: Julio Merino<mailto:julio@meroh.net>
> Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>
> Subject: Re: PowerMac G5 crashes with "instruction storage interrupt"
> on recent 13
>
> Hi Julio,
>
> 971cb62e0b23 is the likely culprit.  Alfredo has a patch at
> https://reviews.freebsd.org/D36234 that you can use until the problem
> is solved.  The alternative is you could build everything into the
> kernel instead of using modules.
>
> The problem appears to be in either lld or the kernel linker.
>
> - Justin
>
> On Fri, 9 Sep 2022 16:00:33 +0000
> Julio Merino <julio@meroh.net> wrote:
>
> > Armed with a lot of patience, I was able to bisect where the crashes
> > are coming from. They seem to be due to these three consecutive and
> > related commits (because the first one broke the build and required
> > two extra fixes for powerpc=92s GENERIC64 to build):
> >
> > 9171b8068b92 cpuset: Fix the KASAN and KMSAN builds
> > 01f281d0ee52 Fix the build after 47a57144
> > 971cb62e0b23 cpuset: Byte swap cpuset for compat32 on big endian
> > architectures
> >
> > Any idea on how to look into these crashes further?
> >
> > Thank you!
> >
> >
> > From: Julio Merino<mailto:julio@meroh.net>
> > Sent: Sunday, July 31, 2022 07:45
> > To: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>
> > Subject: PowerMac G5 crashes with "instruction storage interrupt" on
> > recent 13
> >
> > Hi all,
> >
> > I have a PowerMac G5 that=92s running an old build of FreeBSD 13
> > stable (from around October of last year) that I=92m trying to
> > upgrade to recent stable/13.
> >
> > Booting into a new kernel brings two issues: the first is that the
> > fans spin up to jet engine levels right before transferring control
> > to userspace. An old patch I have locally to mitigate this (which I
> > got from whichever outstanding bug exists for this in the bug
> > tracker) doesn=92t seem to work any longer.
> >
> > The second is that the kernel crashes (apparently) as soon as it
> > tries to mount a ZFS pool during early stages of the boot process,
> > but after successfully transferring control to userspace. Typing
> > this from a photo of the crash so omitting details that I think
> > aren=92t going to be relevant here, like addresses, here is what I
> > get:
> >
> > ----
> > Setting hostid: =85
> > ZFS filesystem version: 5
> > ZFS storage pool version: features support (500)
> >
> > Fatal kernel trap:
> >
> > Exception =3D 0x400 (instruction storage interrupt)
> > =85
> > pid =3D 64, comm =3D zpool
> >
> > panic: instruction storage interrupt trap
> > cpuid =3D 1
> > time =3D =85
> > KDB: stack backtrace:
> > #0 kdb_backtrace
> > #1 vpanic
> > #2 panic
> > #3 trap
> > #4 powerpc_interrupt
> > Uptime: 7s
> > ----
> >
> > Any thoughts about what I could look into? Any =93recent=94 commits tha=
t
> > you think may be at fault?
> >
> > Thanks!
> >
>



--_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:w=3D"urn:sc=
hemas-microsoft-com:office:word" xmlns:m=3D"http://schemas.microsoft.com/of=
fice/2004/12/omml" xmlns=3D"http://www.w3.org/TR/REC-html40">;
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style>
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"#954F72" style=3D"word-wrap:bre=
ak-word">
<div class=3D"WordSection1">
<p class=3D"MsoNormal">Alright, did some more bisecting and reached this ra=
nge of commits where the problem with the fans starts:</p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">g5:/usr/src&gt; git log --oneline f639aeb3fd3e..6f38=
7a563206 sys</p>
<p class=3D"MsoNormal">6f387a563206 vm_reserv: #include vm_extern.h explici=
tly, for arm.</p>
<p class=3D"MsoNormal">bf27b9bc7f5b vm_phys: convert error back to warning<=
/p>
<p class=3D"MsoNormal">87e6f3d27eba vm_phys: #include vm_extern</p>
<p class=3D"MsoNormal">c5a5a9dbcf38 vm_extern: use standard address checker=
s everywhere</p>
<p class=3D"MsoNormal">f8da86347070 linux(4): Implement __vdso_time</p>
<p class=3D"MsoNormal">00c933e9254c linux(4): Use saved cpu feature bits</p=
>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I think we can safely discard the linux(4) commits. =
Other than that, the build seems broken at each intermediate vm_* step so i=
t=92s hard now to pinpoint any of those specifically.</p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Does this ring a bell?</p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Thanks</p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Sent from <a href=3D"https://go.microsoft.com/fwlink=
/?LinkId=3D550986">
Mail</a> for Windows</p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div style=3D"mso-element:para-border-div;border:none;border-top:solid #E1E=
1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class=3D"MsoNormal" style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:julio@meroh.net">Julio Merino</a><br>
<b>Sent: </b>Friday, September 9, 2022 18:41<br>
<b>To: </b><a href=3D"mailto:jhibbits@FreeBSD.org">Justin Hibbits</a><br>
<b>Cc: </b><a href=3D"mailto:freebsd-ppc@freebsd.org">freebsd-ppc@freebsd.o=
rg</a><br>
<b>Subject: </b>RE: PowerMac G5 crashes with &quot;instruction storage inte=
rrupt&quot; on recent 13</p>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">I have now tried to compare the dmesgs and sysctl of=
 a good kernel (built at 9171b8068b92 with the workaround applied) and a re=
cent bad kernel with the workaround applied as well.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The main differences comparing dmesg output, where t=
he dash prefix is for the good kernel and the plus prefix is for the bad ke=
rnel:<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">-----<o:p></o:p></p>
<p class=3D"MsoNormal">-bus_dmamem_alloc failed to align memory properly.<o=
:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">-firewire0: 2 nodes, maxhop &lt;=3D 1 cable IRM irm(=
1)&nbsp; (me)<o:p></o:p></p>
<p class=3D"MsoNormal">+firewire0: 2 nodes, maxhop &lt;=3D 1 Not IRM capabl=
e irm(-1)<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:0: VPD data does not start with ident (0x8=
)<o:p></o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:0: failed to read VPD data.<o:p></o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:0: no valid vpd ident found<o:p></o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:1: VPD data does not start with ident (0x8=
)<o:p></o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:1: failed to read VPD data.<o:p></o:p></p>
<p class=3D"MsoNormal">+pci1:5:4:1: no valid vpd ident found<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">+WARNING: Current temperature (CPU A0 DIODE TEMP: 91=
6.0 C) exceeds critical temperature (90.0 C); count=3D1<o:p></o:p></p>
<p class=3D"MsoNormal">-----<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Note here that the temperature measured seems obviou=
sly wrong once the fans spin up like crazy. And soon after this, count grow=
s too high and the machine shuts down by itself.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Looking at differences for all sysctls that mention =
=93temp=94:<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">-----<o:p></o:p></p>
<p class=3D"MsoNormal">dev.ds1631.0.%pnpinfo: name=3Dtemp-monitor compat=3D=
ds1631<o:p></o:p></p>
<p class=3D"MsoNormal">-dev.ds1631.0.sensor.mlb_inlet_amb.temp: 27.5C<o:p><=
/o:p></p>
<p class=3D"MsoNormal">+dev.ds1631.0.sensor.mlb_inlet_amb.temp: 29.6C<o:p><=
/o:p></p>
<p class=3D"MsoNormal">dev.ds1775.0.%pnpinfo: name=3Dtemp-monitor compat=3D=
ds1775<o:p></o:p></p>
<p class=3D"MsoNormal">-dev.ds1775.0.sensor.drive_bay.temp: 26.5C<o:p></o:p=
></p>
<p class=3D"MsoNormal">+dev.ds1775.0.sensor.drive_bay.temp: 29.5C<o:p></o:p=
></p>
<p class=3D"MsoNormal">dev.max6690.0.%pnpinfo: name=3Dtemp-monitor compat=
=3Dmax6690<o:p></o:p></p>
<p class=3D"MsoNormal">-dev.max6690.0.sensor.backside.temp: 36.1C<o:p></o:p=
></p>
<p class=3D"MsoNormal">-dev.max6690.0.sensor.kodiak_diode.temp: 48.7C<o:p><=
/o:p></p>
<p class=3D"MsoNormal">+dev.max6690.0.sensor.backside.temp: 42.2C<o:p></o:p=
></p>
<p class=3D"MsoNormal">+dev.max6690.0.sensor.kodiak_diode.temp: 55.2C<o:p><=
/o:p></p>
<p class=3D"MsoNormal">dev.max6690.1.%pnpinfo: name=3Dtemp-monitor compat=
=3Dmax6690<o:p></o:p></p>
<p class=3D"MsoNormal">-dev.max6690.1.sensor.tunnel.temp: 31.2C<o:p></o:p><=
/p>
<p class=3D"MsoNormal">-dev.max6690.1.sensor.tunnel_heatsink.temp: 33.7C<o:=
p></o:p></p>
<p class=3D"MsoNormal">+dev.max6690.1.sensor.tunnel.temp: 34.7C<o:p></o:p><=
/p>
<p class=3D"MsoNormal">+dev.max6690.1.sensor.tunnel_heatsink.temp: 39.0C<o:=
p></o:p></p>
<p class=3D"MsoNormal">-dev.smusat.0.cpu_a0_diode_temp: 34.2C<o:p></o:p></p=
>
<p class=3D"MsoNormal">-dev.smusat.0.cpu_a1_diode_temp: 35.0C<o:p></o:p></p=
>
<p class=3D"MsoNormal">kstat.zfs.misc.arcstats.arc_tempreserve: 0<o:p></o:p=
></p>
<p class=3D"MsoNormal">-----<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The fact that dev.smusat.* is gone from the =93bad=
=94 kernel seems suspicious, but smusat0 is detected properly in both kerne=
ls according to dmesg=85<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Any thoughts? I can try to bisect this as well, but =
there are 1500+ changes to sort through so this will take a while.<o:p></o:=
p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">Thanks!<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div style=3D"border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b>From: </b><a href=3D"mailto:jhibbits@FreeBSD.org"=
>Justin Hibbits</a><br>
<b>Sent: </b>Friday, September 9, 2022 12:12<br>
<b>To: </b><a href=3D"mailto:julio@meroh.net">Julio Merino</a><br>
<b>Cc: </b><a href=3D"mailto:freebsd-ppc@freebsd.org">freebsd-ppc@freebsd.o=
rg</a><br>
<b>Subject: </b>Re: PowerMac G5 crashes with &quot;instruction storage inte=
rrupt&quot; on recent 13<o:p></o:p></p>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">That seems bizarre.&n=
bsp; There haven't been any changes to the controller<br>
thread (powermac_thermal.c) in more than 7 years.&nbsp; Are there any<br>
problems with sensors?&nbsp; I tested the change I made back in 2015 on my<=
br>
dual core G5, with the intent that it would ramp the fans up sooner<br>
(non-linear), and back them down with hysteresis.&nbsp; So when there's loa=
d<br>
that raises the temperature significantly it will ramp the fans up as<br>
quickly as it can, hitting 100% fan long before it can reach maximum<br>
temperature.<br>
<br>
- Justin<br>
<br>
On Fri, 9 Sep 2022 19:01:06 +0000<br>
Julio Merino &lt;julio@meroh.net&gt; wrote:<br>
<br>
&gt; Ah, thanks for the workaround. I applied it on top of 9171b8068b92<br>
&gt; and the kernel was able to boot successfully =96 and it seems stable s=
o<br>
&gt; far.<br>
&gt; <br>
&gt; However, if I apply the hack on top of stable/13=92s HEAD, there is<br=
>
&gt; still the issue of the fans going crazy at the slightest increase in<b=
r>
&gt; CPU load but they do drop back down to quiet when the load subsumes.<b=
r>
&gt; (For example, a simple =93git log=94 in /usr/src makes the fan spin up=
<br>
&gt; within a couple of seconds and they stop soon after that.) Any ideas<b=
r>
&gt; on where this might come from?<br>
&gt; <br>
&gt; <br>
&gt; From: Justin Hibbits&lt;<a href=3D"mailto:jhibbits@FreeBSD.org">mailto=
:jhibbits@FreeBSD.org</a>&gt;<br>
&gt; Sent: Friday, September 9, 2022 09:09<br>
&gt; To: Julio Merino&lt;<a href=3D"mailto:julio@meroh.net">mailto:julio@me=
roh.net</a>&gt;<br>
&gt; Cc: freebsd-ppc@freebsd.org&lt;mailto:freebsd-ppc@freebsd.org&gt;<br>
&gt; Subject: Re: PowerMac G5 crashes with &quot;instruction storage interr=
upt&quot;<br>
&gt; on recent 13<br>
&gt; <br>
&gt; Hi Julio,<br>
&gt; <br>
&gt; 971cb62e0b23 is the likely culprit.&nbsp; Alfredo has a patch at<br>
&gt; <a href=3D"https://reviews.freebsd.org/D36234">https://reviews.freebsd=
.org/D36234</a> that you can use until the problem<br>
&gt; is solved.&nbsp; The alternative is you could build everything into th=
e<br>
&gt; kernel instead of using modules.<br>
&gt; <br>
&gt; The problem appears to be in either lld or the kernel linker.<br>
&gt; <br>
&gt; - Justin<br>
&gt; <br>
&gt; On Fri, 9 Sep 2022 16:00:33 +0000<br>
&gt; Julio Merino &lt;julio@meroh.net&gt; wrote:<br>
&gt; <br>
&gt; &gt; Armed with a lot of patience, I was able to bisect where the cras=
hes<br>
&gt; &gt; are coming from. They seem to be due to these three consecutive a=
nd<br>
&gt; &gt; related commits (because the first one broke the build and requir=
ed<br>
&gt; &gt; two extra fixes for powerpc=92s GENERIC64 to build):<br>
&gt; &gt;<br>
&gt; &gt; 9171b8068b92 cpuset: Fix the KASAN and KMSAN builds<br>
&gt; &gt; 01f281d0ee52 Fix the build after 47a57144<br>
&gt; &gt; 971cb62e0b23 cpuset: Byte swap cpuset for compat32 on big endian<=
br>
&gt; &gt; architectures<br>
&gt; &gt;<br>
&gt; &gt; Any idea on how to look into these crashes further?<br>
&gt; &gt;<br>
&gt; &gt; Thank you!<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; From: Julio Merino&lt;<a href=3D"mailto:julio@meroh.net">mailto:j=
ulio@meroh.net</a>&gt;<br>
&gt; &gt; Sent: Sunday, July 31, 2022 07:45<br>
&gt; &gt; To: freebsd-ppc@freebsd.org&lt;mailto:freebsd-ppc@freebsd.org&gt;=
<br>
&gt; &gt; Subject: PowerMac G5 crashes with &quot;instruction storage inter=
rupt&quot; on<br>
&gt; &gt; recent 13<br>
&gt; &gt;<br>
&gt; &gt; Hi all,<br>
&gt; &gt;<br>
&gt; &gt; I have a PowerMac G5 that=92s running an old build of FreeBSD 13<=
br>
&gt; &gt; stable (from around October of last year) that I=92m trying to<br=
>
&gt; &gt; upgrade to recent stable/13.<br>
&gt; &gt;<br>
&gt; &gt; Booting into a new kernel brings two issues: the first is that th=
e<br>
&gt; &gt; fans spin up to jet engine levels right before transferring contr=
ol<br>
&gt; &gt; to userspace. An old patch I have locally to mitigate this (which=
 I<br>
&gt; &gt; got from whichever outstanding bug exists for this in the bug<br>
&gt; &gt; tracker) doesn=92t seem to work any longer.<br>
&gt; &gt;<br>
&gt; &gt; The second is that the kernel crashes (apparently) as soon as it<=
br>
&gt; &gt; tries to mount a ZFS pool during early stages of the boot process=
,<br>
&gt; &gt; but after successfully transferring control to userspace. Typing<=
br>
&gt; &gt; this from a photo of the crash so omitting details that I think<b=
r>
&gt; &gt; aren=92t going to be relevant here, like addresses, here is what =
I<br>
&gt; &gt; get:<br>
&gt; &gt;<br>
&gt; &gt; ----<br>
&gt; &gt; Setting hostid: =85<br>
&gt; &gt; ZFS filesystem version: 5<br>
&gt; &gt; ZFS storage pool version: features support (500)<br>
&gt; &gt;<br>
&gt; &gt; Fatal kernel trap:<br>
&gt; &gt;<br>
&gt; &gt; Exception =3D 0x400 (instruction storage interrupt)<br>
&gt; &gt; =85<br>
&gt; &gt; pid =3D 64, comm =3D zpool<br>
&gt; &gt;<br>
&gt; &gt; panic: instruction storage interrupt trap<br>
&gt; &gt; cpuid =3D 1<br>
&gt; &gt; time =3D =85<br>
&gt; &gt; KDB: stack backtrace:<br>
&gt; &gt; #0 kdb_backtrace<br>
&gt; &gt; #1 vpanic<br>
&gt; &gt; #2 panic<br>
&gt; &gt; #3 trap<br>
&gt; &gt; #4 powerpc_interrupt<br>
&gt; &gt; Uptime: 7s<br>
&gt; &gt; ----<br>
&gt; &gt;<br>
&gt; &gt; Any thoughts about what I could look into? Any =93recent=94 commi=
ts that<br>
&gt; &gt; you think may be at fault?<br>
&gt; &gt;<br>
&gt; &gt; Thanks!<br>
&gt; &gt;&nbsp; <br>
&gt; <o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</body>
</html>

--_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?PH0PR20MB3704500C677E13DCC9C69541C0479>