Date: Tue, 7 May 2019 11:54:01 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: head -r347003 on 2-socket/2-cores-each G5 PowerMac11,2's: one type of boot-blocking context found Message-ID: <C85B1B21-5BF6-4CFC-B928-2F19960B91E2@yahoo.com> In-Reply-To: <20190507130654.20a269f6@titan.knownspace> References: <D2CEBBBA-40A5-4924-9817-53A8ED81011E@yahoo.com> <20190507130654.20a269f6@titan.knownspace>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-May-7, at 11:06, Justin Hibbits <chmeeedalf at gmail.com> wrote: > On Mon, 6 May 2019 22:43:36 -0700 > Mark Millard <marklmi at yahoo.com> wrote: >=20 >> Every example of boot failure during cpu_mp_unleash, >> where I've had the tracking in place, has had 1 or more >> examples of srr0<DMAP_BASE_ADDRESS (EXC_ISE) in >> handle_kernel_slb_spill before cpu_mp_unleash tries to >> start its first ap. >>=20 >> Every example of boot success, where I've had the tracking >> in place, has had no examples of srr0<DMAP_BASE_ADDRESS >> (EXC_ISE) in handle_kernel_slb_spill before the >> cpu_mp_unleash finished. (Successful boots are rare >> in my current test context, so there are fewer examples >> of this.) >>=20 >> In other words: the original live-G5 information >> for the segment was still present throughout that >> time frame, thus avoiding a slbtrap for such a >> fetch address over the time frame involved. >>=20 >>=20 >>=20 >> In the the code: >>=20 >> rstvec =3D rstvec_virtbase + reset; >> printf("powermac_smp_start_cpu: about to use *rstvec=3D=3D4\n"); >> *rstvec =3D 4; >> powerpc_sync(); >> (void)(*rstvec); >> powerpc_sync(); >> DELAY(1); >> printf("powermac_smp_start_cpu: about to use *rstvec=3D=3D0\n"); >> *rstvec =3D 0; >> powerpc_sync(); >> (void)(*rstvec); >> powerpc_sync(); >> printf("powermac_smp_start_cpu: done using *rstvec=3D=3D0\n"); >>=20 >> Every boot failure has had the last line reported by >> FireWire dcons use as the first of those 3 printf's, >> for CPU 2 as the target (of 0-3). >>=20 >> The above code appears to me to execute with MSR.IR=3D1 >> on the bsp. >>=20 >> But, then, what would *rstvec do if there is no ESID=3D0 >> V=3D1 combination active for the live-G5 information at >> the time? Does that block the exception code that >> is in what would be ESID=3D0's address range, effectively >> preventing slbtrap from being invoked to enable ESID=3D0? >>=20 >> In other words: when MSR.IR=3D1, does there always >> need to be a ESID=3D0 V=3D1 entry? Is it appropriate >> to reserve one for ESID=3D0 V=3D1 (after invalidating >> any arbitrarily placed ESID=3D0 V=3D1 entry present >> before the kernel even started)? >=20 > Hi Mark, >=20 > Thanks for continuing to look into this. In this case you're > presenting, a ISE shouldn't really matter, because the SLB miss = handler > is written to run entirely from real mode to handle the miss. Can you > determine what the addresses were that faulted in the failure cases? > We shouldn't be touching anything below DMAP_BASE at this time, since > we're not yet in userspace, and all mappings should be either KVA or > DMAP. I'll try to to get examples of all of them for based on my current code code. But in a earlier message I reported several examples from simply sticking a printf in handle_kernel_sb_spill and later making it controllable to report at selective time frames. (The printf's being there lead to earlier hang-ups. I was surprised I got anything.) Remember that the number of handle_kernel_sb_spill calls for srr0<DMAP_START and dar<DMAP_START varies from boot to boot so the places are not unique unique overall. Here is the core of those old reports for reference: KDB: debugger backends: ddb KDB: current backend: ddb handle_kernel_slb_spill: type=3D0x380 dar=3D0x3d99348 srr0=3D0xa869bc handle_kernel_slb_spill: type=3D0x380 dar=3D0x10000000 srr0=3D0xa869bc Both seemed to involve the stbx instruction in: 0000000000a869bc <.memset+0x20> stbx r4,r9,r3 0000000000a869c0 <.memset+0x24> addi r9,r9,1 0000000000a869c4 <.memset+0x28> bdnz 0000000000a869bc <.memset+0x20> The above was from the unconditional printf addition and, as I remember, repeated for: #ifdef __powerpc64__ i =3D 0; for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; va = +=3D SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); #endif enable_handle_kernel_slb_spill_reporting=3D 1; (Note the (n_slbs-1)/2 that I was experimenting with at the time.) The below was from instead enabling later: enable_handle_kernel_slb_spill_reporting=3D 1; dpcpu_init(dpcpu, curcpu); got (eliminating an unrelated line that had a truncated address showing): KDB: debugger backends: ddb KDB: current backend: ddb handle_kernel_slb_spill: type=3D0x380 dar=3D0x22ef8 srr0=3D0xa86690 handle_kernel_slb_spill: type=3D0x480 dar=3D0x22ef8 srr0=3D0xa86690 Both seemed to involve the stdu instruction in: 0000000000a8668c <.memcpy+0x140> ldu r0,-8(r9) 0000000000a86690 <.memcpy+0x144> stdu r0,-8(r11) 0000000000a86694 <.memcpy+0x148> bdnz 0000000000a8668c = <.memcpy+0x140> =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C85B1B21-5BF6-4CFC-B928-2F19960B91E2>