From nobody Tue Sep 13 13:17:48 2022 X-Original-To: freebsd-ppc@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MRkXy1gh7z4cLJ0 for ; Tue, 13 Sep 2022 13:17:54 +0000 (UTC) (envelope-from jmmv@outlook.com) Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2103.outbound.protection.outlook.com [104.47.55.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MRkXx03svz3sFb; Tue, 13 Sep 2022 13:17:52 +0000 (UTC) (envelope-from jmmv@outlook.com) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hpjYkJjW6Gwvr3GjpGL9TpTr17F+huGmL9v2SpLLKD92dlQlATOvO5Hd2V8URcNIIAr9BpKTs08uHsAGxKyAvHtT2BceRFhwiVSNf16FjGrA4SvYefDXuQarJRA5Y69OI8rlAqo1PW0oeTFn+IsrfW+dAy1xB7DkHv8Wyt75WfsV3UorGL2Omiu+k8CdxjZAvKe11g3L/FigRpitOsFy0Qr0akDE4MLv04J4qiykhtXteeTGiWn2nre34ZO0Nhca9dY2iGaH+lybmo2o5KlMdk3uiol1j0E8HQYtce0JnWSzdGXsQK8gDAEBKtwkPq67X8MGhaaIRS/I6YY9HcoLMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cbcqNRcL67CpgelK+yYONYlFCkjGX5P+jwhUgYH/wlQ=; b=hX5K9VQ2V4ohnlwa+dxuTVWM2M9hRZG2NaWNPAasO0lsMTW3gecp6rALpA1Ivwj569hbzGeuRps+4Gzn8h7s0usNxBQdWdRZxKdMOEnqxS3wTep/xYFzaEtY4tTnsPVVCX2YiyVm97C72kvAMXc9eXY7c3XwChfYgGgFEf2HC5xWVhc3y0gKoDs5m0rhuKbfZOeGfdOf/X8WIfVltr89OiZj8mqEwI1BNEpiRx9gnEfl2uI1Qzi306J2SjydHcQxJ6Mmw+Mg0o3uGeTW0ug5bbjQa60m0FrqSToQi1CkuFa/49ztVcd7gyfmU5jQX5F2Xz/Gcyt+xdHwaB11BDlggQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cbcqNRcL67CpgelK+yYONYlFCkjGX5P+jwhUgYH/wlQ=; b=mpRZm6e57ZGtAdwfgxtmW7fEXekPucABxQRzUE0maku1MEmhVbn11VRyB/enQQpGZqOsDQyalnzi6I3XW4wv9oE9s4Czt/6jb2n0pxV8QJl11T0KSZ/SCtyH6THCpa61IjpQo+nI2PkCbgZxgWlwY1YGASt11fCx8d4tCOjSsYmFuZ4+uujba11YGTiDsKRD8G3l45WpmnxeuaIX951JYE5Q6VTRfrGyh3UjOMNflGRenrosN6t2k+vrzsEiK6+Rh7o2QHuIAktH77IlhvW75ZGcCMBJFJpDNQD9a/t6kfz6WriCNaWZuROuiyGRz83O7U201Zj8mHUfCxAX/z45ag== Received: from PH0PR20MB3704.namprd20.prod.outlook.com (2603:10b6:510:20::22) by BY5PR20MB3605.namprd20.prod.outlook.com (2603:10b6:a03:1da::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.22; Tue, 13 Sep 2022 13:17:48 +0000 Received: from PH0PR20MB3704.namprd20.prod.outlook.com ([fe80::b55f:3317:c6fc:94a0]) by PH0PR20MB3704.namprd20.prod.outlook.com ([fe80::b55f:3317:c6fc:94a0%5]) with mapi id 15.20.5612.022; Tue, 13 Sep 2022 13:17:48 +0000 From: Julio Merino To: Justin Hibbits CC: "freebsd-ppc@freebsd.org" Subject: RE: PowerMac G5 crashes with "instruction storage interrupt" on recent 13 Thread-Topic: PowerMac G5 crashes with "instruction storage interrupt" on recent 13 Thread-Index: AQHYpOsmmne4KVhe90KjAGOwAMTWSq3Xf/GIgAADJoCAAC9H34AABAsAgABqRdSABXtrNQ== Date: Tue, 13 Sep 2022 13:17:48 +0000 Message-ID: References: <20220909120857.61f65069@ralga-linux> <20220909151238.5da8b63a@ralga-linux> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 2 x-tmn: [+eKTdM4Vu0anNZPsRpt8C1GrJ9WhdH2I] x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PH0PR20MB3704:EE_|BY5PR20MB3605:EE_ x-ms-office365-filtering-correlation-id: bf63086c-959e-488b-43ee-08da958a5ad9 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 21RDfWYyjBXakeY3QT0T7ZS/2QiHRQoV0IpiBzuDlaBaNaSdjXo6XN6Nr6h2hIRz9YBuN7AF9gGqU7YsL/8bP2bYy9yNcyOGq5vXxhO+0kxnnRRIcu38mcBhEYMxTd402wEl2jEWTMNpziYsH8FtzTYVPQgpw7OAoNiRG/G+STCzS+Y4ymn+57B0V4WQhpMY5LwdPnnjAxWdaDTVDIHkmjMBYiuaA3A/eddP16z6gnr2JpTdj3P0Hq9pKDg0WNmBiA+J9AFRF15MIN1x0DgMGLW5nAyXK+FhFrZjNuQge0AoKB/jUR4alj3T0Phj9BiP4THGfz8UIdEva6A1K3CmsABVC1LBgvWIBGh30Ah82X1y3MOaSUGjSWYQYutOUkpSrgzsUCiWMYySrjiWpGJehUJ1MsSvVnV4AMEVZRz51E2m8QLYJ0C1Wf/yn4kHnpWQMfFoN13k+Z070b155iLR9+sbWx4T/EIiFFQbsfAZmOOVTFkNLrZvqyf5vfyv0Pz4LZljKn689R26dcLgzYtnseUAnES9+/L7sO/zpmp308xvGX4cstLFsRX2iL+9KnaWnU40LAbQZ6kdiQDrEbbglG2N7SGpOSn/XfQ1xsLXBTGtxUsa6PcpSOIfrLzaxIZtj2TlTjJg2I8VH5Vf+OY5/DYbne0S1dSZ5TwIVuxCs6waXoxTGBNxCoMo7Pvq6nLW x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?Windows-1252?Q?TzzP7UbP1vDdoPSM5pHy0QNkb4KHvi5nLeG5khlPBRe5YXj6kOu3ijVx?= =?Windows-1252?Q?+z0qDhqa++Ina2mRIMRHrSKKAXz7CMQNGX2aosOwf+9kIAhvoIqk+/p9?= =?Windows-1252?Q?TY7MGJgPjXUKqpEygKei+rgjwHpUQVJFRDM14HIhqVI6MK2OSL6wo2wF?= =?Windows-1252?Q?tl//LVumx2NzxpctcZWIJ9PdFhwSy2FeUlp1gg3axWMBz1XUZwt2eB+z?= =?Windows-1252?Q?JyYkllFVz4zKiYwYF9d3rDl+YeE+oV5n+cp5pLnn/rjIsD7AbiOgHM5l?= =?Windows-1252?Q?+pETBPz0mBTshC40nB2JUqySdGkgRVP72h3Mei2EUdYhZItQFbl+TGUx?= =?Windows-1252?Q?2pNve1+3RmJHVXMuT9+8vAUX+LWSfGdSLcHqIIVRDmo19Y8GlFSbIPmK?= =?Windows-1252?Q?U8NlxsT2Dw68MiFVk2EdviDga9lSdqQBJuV4MrxWSXPrNdgc925E3ycG?= =?Windows-1252?Q?ALRnGJ45Q1etQG/qg3O4F9Q5bWN7vruRjB6ZiJirueJI6KMVvYUtwTS4?= =?Windows-1252?Q?YGSeayFtBzqVx7LddLfQxBem1O6oKu/6mkkSVciqlyQnrInPfO+ozTmg?= =?Windows-1252?Q?i+P+GBjoGIbNVbRvwx5X5y8+LlVuKXt/aGshqXzPa4r5lVqO0sSKkL4i?= =?Windows-1252?Q?g0abzz/1aDLqt5W9apq1++k+F0Pxs9IZYzSjQLbjbf/dzhPUgzVkEOG0?= =?Windows-1252?Q?WCNQJlBuMneNXmx622XUt1oNFOPxgAFzCDHtb/Dc6aJA8zspVdcLAZHq?= =?Windows-1252?Q?whWcfRp7t1dEyLhGfRamGz53plWmfVBMAkh3jx/yvyAsyMc0CMmVGEST?= =?Windows-1252?Q?1mmx45MjwF7URR0iKc81PA8eJoM5juFZHfPXNQ+nTrr9rzaaQ+sWpQfk?= =?Windows-1252?Q?Z46pkExiiQY4wONVIjB5334B9R3B3jdqKS48JVCgw4VUueT4kUA2JVYa?= =?Windows-1252?Q?AONS63QjMq3OFjjQvUWjWWnU5MUlR3xAPqLbcUfi66RTrk9iEU9/4tlI?= =?Windows-1252?Q?AC/IWIz/1zQKJIQCw1RbePKOQyC1dW6XWsIu361ZdaPQCllhggeke8yn?= =?Windows-1252?Q?i5Pf41dF3TGiNgSZrRuBHmW0+f+KlJMlYn5CNENfzES3NKZg4zN+lWbJ?= =?Windows-1252?Q?3uozBkVzsv4YY1QLOozcQcWeYGlCGshHb0iBolKcd3TLyrqgnxW5fOO6?= =?Windows-1252?Q?wbsARddGnZx9eRcMEUWlRNK0rPx1q3l9Ji+9GecMGsAuxC6MaKqXnngi?= =?Windows-1252?Q?Zsr1RM+syOzP4E312XMYln3zcW52DXRylRrdVTf2BGBAmxDCNiP1AzFX?= =?Windows-1252?Q?RpN1tRXtt0raOFUIUafki8Z/Zbf45pIwOZfOt1OkqZR/Z8a2LzIBrfw/?= =?Windows-1252?Q?+3LdI/VilSdCbA=3D=3D?= Content-Type: multipart/alternative; boundary="_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_" List-Id: Porting FreeBSD to the PowerPC List-Archive: https://lists.freebsd.org/archives/freebsd-ppc List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ppc@freebsd.org X-BeenThere: freebsd-ppc@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH0PR20MB3704.namprd20.prod.outlook.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: bf63086c-959e-488b-43ee-08da958a5ad9 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Sep 2022 13:17:48.8091 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR20MB3605 X-Rspamd-Queue-Id: 4MRkXx03svz3sFb X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=outlook.com header.s=selector1 header.b=mpRZm6e5; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=none; spf=pass (mx1.freebsd.org: domain of jmmv@outlook.com designates 104.47.55.103 as permitted sender) smtp.mailfrom=jmmv@outlook.com X-Spamd-Result: default: False [-4.20 / 15.00]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[julio@meroh.net,jmmv@outlook.com]; R_SPF_ALLOW(-0.20)[+ip4:104.47.0.0/17]; R_DKIM_ALLOW(-0.20)[outlook.com:s=selector1]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FREEMAIL_ENVFROM(0.00)[outlook.com]; DWL_DNSWL_NONE(0.00)[outlook.com:dkim]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:8075, ipnet:104.40.0.0/13, country:US]; RCVD_IN_DNSWL_NONE(0.00)[104.47.55.103:from]; MLMMJ_DEST(0.00)[freebsd-ppc@freebsd.org]; RCPT_COUNT_TWO(0.00)[2]; RCVD_COUNT_THREE(0.00)[3]; FROM_NEQ_ENVFROM(0.00)[julio@meroh.net,jmmv@outlook.com]; FROM_HAS_DN(0.00)[]; FREEFALL_USER(0.00)[jmmv]; DKIM_TRACE(0.00)[outlook.com:+]; TO_DN_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[meroh.net]; TO_DN_EQ_ADDR_SOME(0.00)[] X-ThisMailContainsUnwantedMimeParts: N --_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Alright, did some more bisecting and reached this range of commits where th= e problem with the fans starts: g5:/usr/src> git log --oneline f639aeb3fd3e..6f387a563206 sys 6f387a563206 vm_reserv: #include vm_extern.h explicitly, for arm. bf27b9bc7f5b vm_phys: convert error back to warning 87e6f3d27eba vm_phys: #include vm_extern c5a5a9dbcf38 vm_extern: use standard address checkers everywhere f8da86347070 linux(4): Implement __vdso_time 00c933e9254c linux(4): Use saved cpu feature bits I think we can safely discard the linux(4) commits. Other than that, the bu= ild seems broken at each intermediate vm_* step so it=92s hard now to pinpo= int any of those specifically. Does this ring a bell? Thanks Sent from Mail for Window= s From: Julio Merino Sent: Friday, September 9, 2022 18:41 To: Justin Hibbits Cc: freebsd-ppc@freebsd.org Subject: RE: PowerMac G5 crashes with "instruction storage interrupt" on re= cent 13 I have now tried to compare the dmesgs and sysctl of a good kernel (built a= t 9171b8068b92 with the workaround applied) and a recent bad kernel with th= e workaround applied as well. The main differences comparing dmesg output, where the dash prefix is for t= he good kernel and the plus prefix is for the bad kernel: ----- -bus_dmamem_alloc failed to align memory properly. -firewire0: 2 nodes, maxhop <=3D 1 cable IRM irm(1) (me) +firewire0: 2 nodes, maxhop <=3D 1 Not IRM capable irm(-1) +pci1:5:4:0: VPD data does not start with ident (0x8) +pci1:5:4:0: failed to read VPD data. +pci1:5:4:0: no valid vpd ident found +pci1:5:4:1: VPD data does not start with ident (0x8) +pci1:5:4:1: failed to read VPD data. +pci1:5:4:1: no valid vpd ident found +WARNING: Current temperature (CPU A0 DIODE TEMP: 916.0 C) exceeds critical= temperature (90.0 C); count=3D1 ----- Note here that the temperature measured seems obviously wrong once the fans= spin up like crazy. And soon after this, count grows too high and the mach= ine shuts down by itself. Looking at differences for all sysctls that mention =93temp=94: ----- dev.ds1631.0.%pnpinfo: name=3Dtemp-monitor compat=3Dds1631 -dev.ds1631.0.sensor.mlb_inlet_amb.temp: 27.5C +dev.ds1631.0.sensor.mlb_inlet_amb.temp: 29.6C dev.ds1775.0.%pnpinfo: name=3Dtemp-monitor compat=3Dds1775 -dev.ds1775.0.sensor.drive_bay.temp: 26.5C +dev.ds1775.0.sensor.drive_bay.temp: 29.5C dev.max6690.0.%pnpinfo: name=3Dtemp-monitor compat=3Dmax6690 -dev.max6690.0.sensor.backside.temp: 36.1C -dev.max6690.0.sensor.kodiak_diode.temp: 48.7C +dev.max6690.0.sensor.backside.temp: 42.2C +dev.max6690.0.sensor.kodiak_diode.temp: 55.2C dev.max6690.1.%pnpinfo: name=3Dtemp-monitor compat=3Dmax6690 -dev.max6690.1.sensor.tunnel.temp: 31.2C -dev.max6690.1.sensor.tunnel_heatsink.temp: 33.7C +dev.max6690.1.sensor.tunnel.temp: 34.7C +dev.max6690.1.sensor.tunnel_heatsink.temp: 39.0C -dev.smusat.0.cpu_a0_diode_temp: 34.2C -dev.smusat.0.cpu_a1_diode_temp: 35.0C kstat.zfs.misc.arcstats.arc_tempreserve: 0 ----- The fact that dev.smusat.* is gone from the =93bad=94 kernel seems suspicio= us, but smusat0 is detected properly in both kernels according to dmesg=85 Any thoughts? I can try to bisect this as well, but there are 1500+ changes= to sort through so this will take a while. Thanks! From: Justin Hibbits Sent: Friday, September 9, 2022 12:12 To: Julio Merino Cc: freebsd-ppc@freebsd.org Subject: Re: PowerMac G5 crashes with "instruction storage interrupt" on re= cent 13 That seems bizarre. There haven't been any changes to the controller thread (powermac_thermal.c) in more than 7 years. Are there any problems with sensors? I tested the change I made back in 2015 on my dual core G5, with the intent that it would ramp the fans up sooner (non-linear), and back them down with hysteresis. So when there's load that raises the temperature significantly it will ramp the fans up as quickly as it can, hitting 100% fan long before it can reach maximum temperature. - Justin On Fri, 9 Sep 2022 19:01:06 +0000 Julio Merino wrote: > Ah, thanks for the workaround. I applied it on top of 9171b8068b92 > and the kernel was able to boot successfully =96 and it seems stable so > far. > > However, if I apply the hack on top of stable/13=92s HEAD, there is > still the issue of the fans going crazy at the slightest increase in > CPU load but they do drop back down to quiet when the load subsumes. > (For example, a simple =93git log=94 in /usr/src makes the fan spin up > within a couple of seconds and they stop soon after that.) Any ideas > on where this might come from? > > > From: Justin Hibbits > Sent: Friday, September 9, 2022 09:09 > To: Julio Merino > Cc: freebsd-ppc@freebsd.org > Subject: Re: PowerMac G5 crashes with "instruction storage interrupt" > on recent 13 > > Hi Julio, > > 971cb62e0b23 is the likely culprit. Alfredo has a patch at > https://reviews.freebsd.org/D36234 that you can use until the problem > is solved. The alternative is you could build everything into the > kernel instead of using modules. > > The problem appears to be in either lld or the kernel linker. > > - Justin > > On Fri, 9 Sep 2022 16:00:33 +0000 > Julio Merino wrote: > > > Armed with a lot of patience, I was able to bisect where the crashes > > are coming from. They seem to be due to these three consecutive and > > related commits (because the first one broke the build and required > > two extra fixes for powerpc=92s GENERIC64 to build): > > > > 9171b8068b92 cpuset: Fix the KASAN and KMSAN builds > > 01f281d0ee52 Fix the build after 47a57144 > > 971cb62e0b23 cpuset: Byte swap cpuset for compat32 on big endian > > architectures > > > > Any idea on how to look into these crashes further? > > > > Thank you! > > > > > > From: Julio Merino > > Sent: Sunday, July 31, 2022 07:45 > > To: freebsd-ppc@freebsd.org > > Subject: PowerMac G5 crashes with "instruction storage interrupt" on > > recent 13 > > > > Hi all, > > > > I have a PowerMac G5 that=92s running an old build of FreeBSD 13 > > stable (from around October of last year) that I=92m trying to > > upgrade to recent stable/13. > > > > Booting into a new kernel brings two issues: the first is that the > > fans spin up to jet engine levels right before transferring control > > to userspace. An old patch I have locally to mitigate this (which I > > got from whichever outstanding bug exists for this in the bug > > tracker) doesn=92t seem to work any longer. > > > > The second is that the kernel crashes (apparently) as soon as it > > tries to mount a ZFS pool during early stages of the boot process, > > but after successfully transferring control to userspace. Typing > > this from a photo of the crash so omitting details that I think > > aren=92t going to be relevant here, like addresses, here is what I > > get: > > > > ---- > > Setting hostid: =85 > > ZFS filesystem version: 5 > > ZFS storage pool version: features support (500) > > > > Fatal kernel trap: > > > > Exception =3D 0x400 (instruction storage interrupt) > > =85 > > pid =3D 64, comm =3D zpool > > > > panic: instruction storage interrupt trap > > cpuid =3D 1 > > time =3D =85 > > KDB: stack backtrace: > > #0 kdb_backtrace > > #1 vpanic > > #2 panic > > #3 trap > > #4 powerpc_interrupt > > Uptime: 7s > > ---- > > > > Any thoughts about what I could look into? Any =93recent=94 commits tha= t > > you think may be at fault? > > > > Thanks! > > > --_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable

Alright, did some more bisecting and reached this ra= nge of commits where the problem with the fans starts:

 

g5:/usr/src> git log --oneline f639aeb3fd3e..6f38= 7a563206 sys

6f387a563206 vm_reserv: #include vm_extern.h explici= tly, for arm.

bf27b9bc7f5b vm_phys: convert error back to warning<= /p>

87e6f3d27eba vm_phys: #include vm_extern

c5a5a9dbcf38 vm_extern: use standard address checker= s everywhere

f8da86347070 linux(4): Implement __vdso_time

00c933e9254c linux(4): Use saved cpu feature bits

 

I think we can safely discard the linux(4) commits. = Other than that, the build seems broken at each intermediate vm_* step so i= t=92s hard now to pinpoint any of those specifically.

 

Does this ring a bell?

 

Thanks

 

Sent from Mail for Windows

 

From: Julio Merino
Sent: Friday, September 9, 2022 18:41
To: Justin Hibbits
Cc: freebsd-ppc@freebsd.o= rg
Subject: RE: PowerMac G5 crashes with "instruction storage inte= rrupt" on recent 13

 

I have now tried to compare the dmesgs and sysctl of= a good kernel (built at 9171b8068b92 with the workaround applied) and a re= cent bad kernel with the workaround applied as well.

 

The main differences comparing dmesg output, where t= he dash prefix is for the good kernel and the plus prefix is for the bad ke= rnel:

 

-----

-bus_dmamem_alloc failed to align memory properly.

 

-firewire0: 2 nodes, maxhop <=3D 1 cable IRM irm(= 1)  (me)

+firewire0: 2 nodes, maxhop <=3D 1 Not IRM capabl= e irm(-1)

 

+pci1:5:4:0: VPD data does not start with ident (0x8= )

+pci1:5:4:0: failed to read VPD data.

+pci1:5:4:0: no valid vpd ident found

+pci1:5:4:1: VPD data does not start with ident (0x8= )

+pci1:5:4:1: failed to read VPD data.

+pci1:5:4:1: no valid vpd ident found

 

+WARNING: Current temperature (CPU A0 DIODE TEMP: 91= 6.0 C) exceeds critical temperature (90.0 C); count=3D1

-----

 

Note here that the temperature measured seems obviou= sly wrong once the fans spin up like crazy. And soon after this, count grow= s too high and the machine shuts down by itself.

 

Looking at differences for all sysctls that mention = =93temp=94:

 

-----

dev.ds1631.0.%pnpinfo: name=3Dtemp-monitor compat=3D= ds1631

-dev.ds1631.0.sensor.mlb_inlet_amb.temp: 27.5C<= /o:p>

+dev.ds1631.0.sensor.mlb_inlet_amb.temp: 29.6C<= /o:p>

dev.ds1775.0.%pnpinfo: name=3Dtemp-monitor compat=3D= ds1775

-dev.ds1775.0.sensor.drive_bay.temp: 26.5C

+dev.ds1775.0.sensor.drive_bay.temp: 29.5C

dev.max6690.0.%pnpinfo: name=3Dtemp-monitor compat= =3Dmax6690

-dev.max6690.0.sensor.backside.temp: 36.1C

-dev.max6690.0.sensor.kodiak_diode.temp: 48.7C<= /o:p>

+dev.max6690.0.sensor.backside.temp: 42.2C

+dev.max6690.0.sensor.kodiak_diode.temp: 55.2C<= /o:p>

dev.max6690.1.%pnpinfo: name=3Dtemp-monitor compat= =3Dmax6690

-dev.max6690.1.sensor.tunnel.temp: 31.2C<= /p>

-dev.max6690.1.sensor.tunnel_heatsink.temp: 33.7C

+dev.max6690.1.sensor.tunnel.temp: 34.7C<= /p>

+dev.max6690.1.sensor.tunnel_heatsink.temp: 39.0C

-dev.smusat.0.cpu_a0_diode_temp: 34.2C

-dev.smusat.0.cpu_a1_diode_temp: 35.0C

kstat.zfs.misc.arcstats.arc_tempreserve: 0

-----

 

The fact that dev.smusat.* is gone from the =93bad= =94 kernel seems suspicious, but smusat0 is detected properly in both kerne= ls according to dmesg=85

 

Any thoughts? I can try to bisect this as well, but = there are 1500+ changes to sort through so this will take a while.

 

Thanks!

 

 

From: Justin Hibbits
Sent: Friday, September 9, 2022 12:12
To: Julio Merino
Cc: freebsd-ppc@freebsd.o= rg
Subject: Re: PowerMac G5 crashes with "instruction storage inte= rrupt" on recent 13

 

That seems bizarre.&n= bsp; There haven't been any changes to the controller
thread (powermac_thermal.c) in more than 7 years.  Are there any
problems with sensors?  I tested the change I made back in 2015 on my<= br> dual core G5, with the intent that it would ramp the fans up sooner
(non-linear), and back them down with hysteresis.  So when there's loa= d
that raises the temperature significantly it will ramp the fans up as
quickly as it can, hitting 100% fan long before it can reach maximum
temperature.

- Justin

On Fri, 9 Sep 2022 19:01:06 +0000
Julio Merino <julio@meroh.net> wrote:

> Ah, thanks for the workaround. I applied it on top of 9171b8068b92
> and the kernel was able to boot successfully =96 and it seems stable s= o
> far.
>
> However, if I apply the hack on top of stable/13=92s HEAD, there is > still the issue of the fans going crazy at the slightest increase in > CPU load but they do drop back down to quiet when the load subsumes. > (For example, a simple =93git log=94 in /usr/src makes the fan spin up=
> within a couple of seconds and they stop soon after that.) Any ideas > on where this might come from?
>
>
> From: Justin Hibbits<mailto= :jhibbits@FreeBSD.org>
> Sent: Friday, September 9, 2022 09:09
> To: Julio Merino<mailto:julio@me= roh.net>
> Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>
> Subject: Re: PowerMac G5 crashes with "instruction storage interr= upt"
> on recent 13
>
> Hi Julio,
>
> 971cb62e0b23 is the likely culprit.  Alfredo has a patch at
> https://reviews.freebsd= .org/D36234 that you can use until the problem
> is solved.  The alternative is you could build everything into th= e
> kernel instead of using modules.
>
> The problem appears to be in either lld or the kernel linker.
>
> - Justin
>
> On Fri, 9 Sep 2022 16:00:33 +0000
> Julio Merino <julio@meroh.net> wrote:
>
> > Armed with a lot of patience, I was able to bisect where the cras= hes
> > are coming from. They seem to be due to these three consecutive a= nd
> > related commits (because the first one broke the build and requir= ed
> > two extra fixes for powerpc=92s GENERIC64 to build):
> >
> > 9171b8068b92 cpuset: Fix the KASAN and KMSAN builds
> > 01f281d0ee52 Fix the build after 47a57144
> > 971cb62e0b23 cpuset: Byte swap cpuset for compat32 on big endian<= br> > > architectures
> >
> > Any idea on how to look into these crashes further?
> >
> > Thank you!
> >
> >
> > From: Julio Merino<mailto:j= ulio@meroh.net>
> > Sent: Sunday, July 31, 2022 07:45
> > To: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org>=
> > Subject: PowerMac G5 crashes with "instruction storage inter= rupt" on
> > recent 13
> >
> > Hi all,
> >
> > I have a PowerMac G5 that=92s running an old build of FreeBSD 13<= br> > > stable (from around October of last year) that I=92m trying to > > upgrade to recent stable/13.
> >
> > Booting into a new kernel brings two issues: the first is that th= e
> > fans spin up to jet engine levels right before transferring contr= ol
> > to userspace. An old patch I have locally to mitigate this (which= I
> > got from whichever outstanding bug exists for this in the bug
> > tracker) doesn=92t seem to work any longer.
> >
> > The second is that the kernel crashes (apparently) as soon as it<= br> > > tries to mount a ZFS pool during early stages of the boot process= ,
> > but after successfully transferring control to userspace. Typing<= br> > > this from a photo of the crash so omitting details that I think > > aren=92t going to be relevant here, like addresses, here is what = I
> > get:
> >
> > ----
> > Setting hostid: =85
> > ZFS filesystem version: 5
> > ZFS storage pool version: features support (500)
> >
> > Fatal kernel trap:
> >
> > Exception =3D 0x400 (instruction storage interrupt)
> > =85
> > pid =3D 64, comm =3D zpool
> >
> > panic: instruction storage interrupt trap
> > cpuid =3D 1
> > time =3D =85
> > KDB: stack backtrace:
> > #0 kdb_backtrace
> > #1 vpanic
> > #2 panic
> > #3 trap
> > #4 powerpc_interrupt
> > Uptime: 7s
> > ----
> >
> > Any thoughts about what I could look into? Any =93recent=94 commi= ts that
> > you think may be at fault?
> >
> > Thanks!
> > 
>

 

 

--_000_PH0PR20MB3704500C677E13DCC9C69541C0479PH0PR20MB3704namp_--