Date: Wed, 1 May 2019 23:21:43 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: How many segments does it take to span from VM_MIN_KERNEL_ADDRESS through VM_MAX_SAFE_KERNEL_ADDRESS? 128 in moea64_late_bootstrap Message-ID: <AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7@yahoo.com> In-Reply-To: <1B8116F2-9749-4331-95BD-D528AA52A771@yahoo.com> References: <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com> <6159F4A6-9431-4B99-AA62-451B8DF08A6E@yahoo.com> <20190501094029.542c5f46@titan.knownspace> <212E50E5-7EB1-4381-A662-D5EACB1E5D46@yahoo.com> <C01CF848-890B-407D-876A-9C48F5F3CD40@yahoo.com> <20190501165403.7d8d1f8f@titan.knownspace> <1B8116F2-9749-4331-95BD-D528AA52A771@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Some results, mixed Im afraid.] On 2019-May-1, at 17:22, Mark Millard <marklmi at yahoo.com> wrote: > On 2019-May-1, at 14:54, Justin Hibbits <chmeeedalf at gmail.com> = wrote: >=20 >> On Wed, 1 May 2019 14:35:56 -0700 >> Mark Millard <marklmi@yahoo.com> wrote: >>=20 >>>>> What happens if you revert all your patches, =20 >>>>=20 >>>> Most of the patches in Bugzilla 233863 are not for this >>>> issue at all and are not tied to starting the non-bsp >>>> cpus. (The one for improving how close the Time Base >>>> registers are is tied to starting these cpus.) Only the >>>> aim/mp_cpudep.c and aim/slb.c changes seem relevant. >>>>=20 >>>> Are you worried about some form of interaction that means >>>> I need to avoid patches for other issues? >>>>=20 >>>> Note: for now I'm staying at using head -r345758 as the >>>> basis for my experiments. >>>>=20 >>>>> and change this loop to >>>>> stop at n_slb? So something more akin to: >>>>>=20 >>>>> int i =3D 0; >>>>>=20 >>>>> for (va =3D virtual_avail; va < virtual_end && i < n_slb - >>>>> 1; va +=3D SEGMENT_LENGTH, i++); >>>>> ... >>>>>=20 >>>>> If it reliably boots with that, then that's fine. We can prefault >>>>> as much as we can and leave the rest for on-demand. =20 >>>>=20 >>>> I'm happy to experiment with this loop without my hack >>>> for forcing the slb entry to exist in cpudep_ap_bootstrap. >>>>=20 >>>> But, it seems to presume that the pc_curpcb's will >>>> all always point into the lower address range spanned >>>> when cpudep_ap_bootstrap is executing on the cpu. >>>> Does some known property limit the pc_curpcb-> >>>> references to such? Only that would be sure to >>>> avoid an slb-miss at that stage. Or is this just an >>>> alternate hack or a means of getting evidence, not a >>>> proposed solution? >>>>=20 >>>> (Again, I'm happy to disable my hack that forces the >>>> slb entry and to try the loop suggested.) =20 >> ... >>> And the patch for the loop looks like: >>>=20 >>> virtual_end =3D VM_MAX_SAFE_KERNEL_ADDRESS;=20 >>>=20 >>> /* >>> - * Map the entire KVA range into the SLB. We must not fault >>> there. >>> + * Map the lower-address part of the KVA range into the SLB. >>> We must not fault there. */ >>> #ifdef __powerpc64__ >>> - for (va =3D virtual_avail; va < virtual_end; va +=3D >>> SEGMENT_LENGTH) >>> + i =3D 0; >>> + for (va =3D virtual_avail; va < virtual_end && i<n_slbs-1; va >>> +=3D SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); >>> #endif >>>=20 >>=20 >> Yep, that's the patch I was going for. >>=20 >>>=20 >>> So I've built, installed, and have tested some: it did not go well >>> overall. >>>=20 >>> Using: >>>=20 >>> OK set debug.verbose_sysinit=3D1 >>>=20 >>> to show better context about where the hangs occur, shows: >>> (Typed from a screen picture.) >>>=20 >>> subsystem a800000 >>> boot_run_interrupt_driven_config_hooks(0)... >>> . . . (omitted) . . . >>> done. >>> vt_upgrade(&vt_consdev). . . >>>=20 >>> The "vt_upgrade(&vt_consdev). . ." never says done when booting >>> hangs with the above changes. >>>=20 >>> Trying to boot a bunch of times did produce one >>> completed boot, all 4 cpus working. Otherwise I'm >>> using kernel.old to manage to complete a boot. >>>=20 >>> I'll note that "vt_upgrade(&vt_consdev). . ." is where >>> Dennis Clarke reported for the hangups that he was >>> seeing, without any of my patches being available back >>> then: 2019-Feb-14. >>=20 >> Maybe try the commit that caused the problem back in July? r334498. >>=20 >=20 > I'd already started down the path of getting materials from: >=20 > = https://artifact.ci.freebsd.org/snapshot/head/r347003/powerpc/powerpc64/ >=20 > and putting them on a separate SSD that I sometimes use for = artifact.ci > or snapshot experiments. Also: checking out matching svn sources for > -r347003 and then doing a buildworld buildkernel with a bootstrap gcc > 4.2.1 compiler used. I'm verifying that I can build it before making > the source changes for the kernel. The build is of a debug kernel > (GENERIC64). >=20 > The test buildworld is still in process. >=20 > Let me know if this is insufficient for your purposes. I could revert > to: >=20 > = https://artifact.ci.freebsd.org/snapshot/head/r334594/powerpc/powerpc64/ >=20 > (There is no head/r334498/ and the first after that with a > powerpc64/ is head/r334594/ .) >=20 > For either head/r347003/ or head/r334594/ : >=20 > Use of artifact materials allows using officially built files for > every file but some specific file(s) that I replace. It also allows > comparison/contrast of the behavior of the official files vs. when > adjusted ones are substituted. >=20 > Use of artifact-version materials also means that I know I'm using > a vintage that actually built --and so I hope to avoid other problems > getting in the way. I present without-the-patch results before presenting with-the-patch results. The end result is mixed, I'm afraid. As for the results without any patch, just artifact materials . . . Note: "Add debug.verbose_sysinit tunable for VERBOSE_SYSINIT" was not checked-in until -r335458 . Trying to boot without any updates or rebuilds, just artifact materials shows variable stopping points: (For debug.verbose_sysinit=3D1 :) -r347003 stops sometimes at: vt_upgrade(&vt_consdev). . . -r347003 stops sometimes at: cpu_mp_unleash(0). . . -r334594 stops after: ada0 lines, VERBOSE_SYSINIT not built in So I had to build my own -r334594 kernel to see verbose_sysinit information about the stopping point. Again, no patch here, I just copied over my build of the /boot/kernel/kernel file: -r334594 stops sometimes at: vt_upgrade(&vt_consdev). . . -r334594 stops sometimes at: cpu_mp_unleash(0). . . Summary thus far: I did not find any obvious difference in how often each stops in either of the alternatives. So I'm seeing if the proposed patch changes the behavior of -r347003 . Later test of patched -r347003 . . . The patched kernel is based on: # svnlite diff /mnt/usr/src/ | more Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (revision 347003) +++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (working copy) @@ -959,7 +959,8 @@ * Map the entire KVA range into the SLB. We must not fault = there. */ #ifdef __powerpc64__ - for (va =3D virtual_avail; va < virtual_end; va +=3D = SEGMENT_LENGTH) + i =3D 0; + for (va =3D virtual_avail; va < virtual_end && i<n_slbs-1; va +=3D= SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); #endif =20 So far with the patched code: -r347003 has never stopped at: vt_upgrade(&vt_consdev). . . -r347003 stops sometimes at: cpu_mp_unleash(0). . . [but differently!] -r347003 panics at a particular point the rest of the time The cpu_mp_unleash hangups report: (typed from screen pictures) subsystem f000000 cpu_mp_unleash(0)... Launching APs 1 2 SMP: 4 CPUs found; 4 CPUs = usable; 3 CPUs woken After that it is hung-up. As for when that does not happen . . . I do not even have /etc/fstab set up and so end up at the mountroot> prompt. When I enter "ufs:/dev/daa0s3" I get a panic for: panic: mtx_lock of spin mutex WWV @ = /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c:2812 (it is a debug-kernel build) For reference, line 2812 is: PMAP_LOCK(pm); panic is reached via an interesting(?) call chain, showing the backtrace (typed from screen pictures): .__mtx_lock_flags+0xd4 .moea64_sync_icache+0x48 .pmap_sync_icache+0x90 .ppc_instr_emulate+0x1b4 .trap+0x10fc .powerpc_interrupt+0x2cc user PGM trap by 0x810053bb4: srr1=3D0x900000000008d032 r1=3D0x3ffffffffffffcc00 cr=3D0x20002024 xer=3D0 ctr=3D0x1 = r2=3D0x81007bdd0 frame=3D0xe000000070ca9810 It was thread pid 28 tid 100097 So far these details seem consistent. But I will note that openfirmware use via ofwdump -ap and the like causes system crashes going back to when the direct map base was moved to high memory addresses ( -r330610 and later ). This is one of the reasons I want to avoid openfirmware and use the conversion to fdt instead. (There is a -r330614 artifact to test such crashes with --or use a later one that otherwise boots.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7>