Date: Thu, 2 May 2019 03:45:47 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: How many segments does it take to span from VM_MIN_KERNEL_ADDRESS through VM_MAX_SAFE_KERNEL_ADDRESS? 128 in moea64_late_bootstrap Message-ID: <A0839E16-A654-4391-9A1B-A84D028A6CF7@yahoo.com> In-Reply-To: <AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7@yahoo.com> References: <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com> <6159F4A6-9431-4B99-AA62-451B8DF08A6E@yahoo.com> <20190501094029.542c5f46@titan.knownspace> <212E50E5-7EB1-4381-A662-D5EACB1E5D46@yahoo.com> <C01CF848-890B-407D-876A-9C48F5F3CD40@yahoo.com> <20190501165403.7d8d1f8f@titan.knownspace> <1B8116F2-9749-4331-95BD-D528AA52A771@yahoo.com> <AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7@yahoo.com>
index | next in thread | previous in thread | raw e-mail
["vt_upgrade(&vt_consdev). . ." hang-ups with the patch do happen for -r347003. The patch does not fix the overall hangs-up behavior, although it changes some detailed behavior that is associated. I've also avoided the panic issue by avoiding cmpb use. This does not fix the "mtx_lock of spin mutex WWV" but avoids it.] On 2019-May-1, at 23:21, Mark Millard <marklmi at yahoo.com> wrote: > [Some results, mixed Im afraid.] > > On 2019-May-1, at 17:22, Mark Millard <marklmi at yahoo.com> wrote: > >> On 2019-May-1, at 14:54, Justin Hibbits <chmeeedalf at gmail.com> wrote: >> >>> On Wed, 1 May 2019 14:35:56 -0700 >>> Mark Millard <marklmi@yahoo.com> wrote: >>> >>>>>> What happens if you revert all your patches, >>>>> >>>>> Most of the patches in Bugzilla 233863 are not for this >>>>> issue at all and are not tied to starting the non-bsp >>>>> cpus. (The one for improving how close the Time Base >>>>> registers are is tied to starting these cpus.) Only the >>>>> aim/mp_cpudep.c and aim/slb.c changes seem relevant. >>>>> >>>>> Are you worried about some form of interaction that means >>>>> I need to avoid patches for other issues? >>>>> >>>>> Note: for now I'm staying at using head -r345758 as the >>>>> basis for my experiments. >>>>> >>>>>> and change this loop to >>>>>> stop at n_slb? So something more akin to: >>>>>> >>>>>> int i = 0; >>>>>> >>>>>> for (va = virtual_avail; va < virtual_end && i < n_slb - >>>>>> 1; va += SEGMENT_LENGTH, i++); >>>>>> ... >>>>>> >>>>>> If it reliably boots with that, then that's fine. We can prefault >>>>>> as much as we can and leave the rest for on-demand. >>>>> >>>>> I'm happy to experiment with this loop without my hack >>>>> for forcing the slb entry to exist in cpudep_ap_bootstrap. >>>>> >>>>> But, it seems to presume that the pc_curpcb's will >>>>> all always point into the lower address range spanned >>>>> when cpudep_ap_bootstrap is executing on the cpu. >>>>> Does some known property limit the pc_curpcb-> >>>>> references to such? Only that would be sure to >>>>> avoid an slb-miss at that stage. Or is this just an >>>>> alternate hack or a means of getting evidence, not a >>>>> proposed solution? >>>>> >>>>> (Again, I'm happy to disable my hack that forces the >>>>> slb entry and to try the loop suggested.) >>> ... >>>> And the patch for the loop looks like: >>>> >>>> virtual_end = VM_MAX_SAFE_KERNEL_ADDRESS; >>>> >>>> /* >>>> - * Map the entire KVA range into the SLB. We must not fault >>>> there. >>>> + * Map the lower-address part of the KVA range into the SLB. >>>> We must not fault there. */ >>>> #ifdef __powerpc64__ >>>> - for (va = virtual_avail; va < virtual_end; va += >>>> SEGMENT_LENGTH) >>>> + i = 0; >>>> + for (va = virtual_avail; va < virtual_end && i<n_slbs-1; va >>>> += SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); >>>> #endif >>>> >>> >>> Yep, that's the patch I was going for. >>> >>>> >>>> So I've built, installed, and have tested some: it did not go well >>>> overall. >>>> >>>> Using: >>>> >>>> OK set debug.verbose_sysinit=1 >>>> >>>> to show better context about where the hangs occur, shows: >>>> (Typed from a screen picture.) >>>> >>>> subsystem a800000 >>>> boot_run_interrupt_driven_config_hooks(0)... >>>> . . . (omitted) . . . >>>> done. >>>> vt_upgrade(&vt_consdev). . . >>>> >>>> The "vt_upgrade(&vt_consdev). . ." never says done when booting >>>> hangs with the above changes. >>>> >>>> Trying to boot a bunch of times did produce one >>>> completed boot, all 4 cpus working. Otherwise I'm >>>> using kernel.old to manage to complete a boot. >>>> >>>> I'll note that "vt_upgrade(&vt_consdev). . ." is where >>>> Dennis Clarke reported for the hangups that he was >>>> seeing, without any of my patches being available back >>>> then: 2019-Feb-14. >>> >>> Maybe try the commit that caused the problem back in July? r334498. >>> >> >> I'd already started down the path of getting materials from: >> >> https://artifact.ci.freebsd.org/snapshot/head/r347003/powerpc/powerpc64/ >> >> and putting them on a separate SSD that I sometimes use for artifact.ci >> or snapshot experiments. Also: checking out matching svn sources for >> -r347003 and then doing a buildworld buildkernel with a bootstrap gcc >> 4.2.1 compiler used. I'm verifying that I can build it before making >> the source changes for the kernel. The build is of a debug kernel >> (GENERIC64). >> >> The test buildworld is still in process. >> >> Let me know if this is insufficient for your purposes. I could revert >> to: >> >> https://artifact.ci.freebsd.org/snapshot/head/r334594/powerpc/powerpc64/ >> >> (There is no head/r334498/ and the first after that with a >> powerpc64/ is head/r334594/ .) >> >> For either head/r347003/ or head/r334594/ : >> >> Use of artifact materials allows using officially built files for >> every file but some specific file(s) that I replace. It also allows >> comparison/contrast of the behavior of the official files vs. when >> adjusted ones are substituted. >> >> Use of artifact-version materials also means that I know I'm using >> a vintage that actually built --and so I hope to avoid other problems >> getting in the way. > > I present without-the-patch results before presenting > with-the-patch results. The end result is mixed, I'm > afraid. > > > > As for the results without any patch, > just artifact materials . . . > > Note: "Add debug.verbose_sysinit tunable for VERBOSE_SYSINIT" was > not checked-in until -r335458 . > > Trying to boot without any updates or rebuilds, just artifact > materials shows variable stopping points: > > (For debug.verbose_sysinit=1 :) > -r347003 stops sometimes at: vt_upgrade(&vt_consdev). . . > -r347003 stops sometimes at: cpu_mp_unleash(0). . . > > -r334594 stops after: ada0 lines, VERBOSE_SYSINIT not built in > > > > So I had to build my own -r334594 kernel to see verbose_sysinit > information about the stopping point. Again, no patch here, > I just copied over my build of the /boot/kernel/kernel file: > > -r334594 stops sometimes at: vt_upgrade(&vt_consdev). . . > -r334594 stops sometimes at: cpu_mp_unleash(0). . . > > > Summary thus far: > > I did not find any obvious difference in how often each stops > in either of the alternatives. > > So I'm seeing if the proposed patch changes the behavior of > -r347003 . > > > > Later test of patched -r347003 . . . > > The patched kernel is based on: > > # svnlite diff /mnt/usr/src/ | more > Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c > =================================================================== > --- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (revision 347003) > +++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (working copy) > @@ -959,7 +959,8 @@ > * Map the entire KVA range into the SLB. We must not fault there. > */ > #ifdef __powerpc64__ > - for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH) > + i = 0; > + for (va = virtual_avail; va < virtual_end && i<n_slbs-1; va += SEGMENT_LENGTH, i++) > moea64_bootstrap_slb_prefault(va, 0); > #endif > > > So far with the patched code: > > -r347003 has never stopped at: vt_upgrade(&vt_consdev). . . I have since had hang-ups at "vt_upgrade(&vt_consdev). . .". > -r347003 stops sometimes at: cpu_mp_unleash(0). . . [but differently!] > -r347003 panics at a particular point the rest of the time > > The cpu_mp_unleash hangups report: > (typed from screen pictures) > > subsystem f000000 > cpu_mp_unleash(0)... Launching APs 1 2 SMP: 4 CPUs found; 4 CPUs usable; 3 CPUs woken > > After that it is hung-up. > > > As for when that does not happen . . . > > I do not even have /etc/fstab set up and so end up at the mountroot> > prompt. When I enter "ufs:/dev/daa0s3" I get a panic for: > > panic: mtx_lock of spin mutex WWV @ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c:2812 > (it is a debug-kernel build) > > For reference, line 2812 is: PMAP_LOCK(pm); > > panic is reached via an interesting(?) call chain, > showing the backtrace (typed from screen pictures): > > .__mtx_lock_flags+0xd4 > .moea64_sync_icache+0x48 > .pmap_sync_icache+0x90 > .ppc_instr_emulate+0x1b4 > .trap+0x10fc > .powerpc_interrupt+0x2cc > user PGM trap by 0x810053bb4: srr1=0x900000000008d032 > r1=0x3ffffffffffffcc00 cr=0x20002024 xer=0 ctr=0x1 r2=0x81007bdd0 frame=0xe000000070ca9810 > > It was thread pid 28 tid 100097 > > So far these details seem consistent. > > But I will note that openfirmware use via ofwdump -ap > and the like causes system crashes going back to when > the direct map base was moved to high memory addresses > ( -r330610 and later ). This is one of the reasons I > want to avoid openfirmware and use the conversion to > fdt instead. (There is a -r330614 artifact to test > such crashes with --or use a later one that otherwise > boots.) I avoided the panics by adjusting src/lib/libc/powerpc64/string/strcmp.S to not use cmpb instructions. This does not fix the "mtx_lock of spin mutex WWV" but avoids it. So now there are two patches: # svnlite diff /mnt/usr/src/ Index: /mnt/usr/src/lib/libc/powerpc64/string/strcmp.S =================================================================== --- /mnt/usr/src/lib/libc/powerpc64/string/strcmp.S (revision 347003) +++ /mnt/usr/src/lib/libc/powerpc64/string/strcmp.S (working copy) @@ -88,9 +88,16 @@ .Lstrcmp_compare_by_word: ld %r5,0(%r3) /* Load double words. */ ld %r6,0(%r4) - xor %r8,%r8,%r8 /* %r8 <- Zero. */ + lis %r8,32639 /* 0x7f7f */ + ori %r8,%r8,32639 /* 0x7f7f7f7f */ + rldimi %r8,%r8,32,0 /* 0x7f7f7f7f'7f7f7f7f */ xor %r0,%r5,%r6 /* Check if double words are different. */ - cmpb %r7,%r5,%r8 /* Check if double words contain zero. */ + /* Check for zero vs. not bytes: */ + and %r9,%r5,%r8 /* 0x00->0x00, 0x80->0x00, other->ms-bit-in-byte==0 */ + add %r9,%r9,%r8 /* ->0x7f, ->0x7f, ->ms-bit-in-byte==1 */ + nor %r7,%r9,%r5 /* ->0x80, ->0x00, ->ms-bit-in-byte==0 */ + andc %r7,%r7,%r8 /* ->0x80, ->0x00, ->0x00 */ + /* sort of like cmpb %r7,%r5,%r8 for %r8 being zero */ /* * If double words are different or contain zero, @@ -104,7 +111,12 @@ ldu %r5,8(%r3) /* Load double words. */ ldu %r6,8(%r4) xor %r0,%r5,%r6 /* Check if double words are different. */ - cmpb %r7,%r5,%r8 /* Check if double words contain zero. */ + /* Check for zero vs. not bytes: */ + and %r9,%r5,%r8 /* 0x00->0x00, 0x80->0x00, other->ms-bit-in-byte==0 */ + add %r9,%r9,%r8 /* ->0x7f, ->0x7f, ->ms-bit-in-byte==1 */ + nor %r7,%r9,%r5 /* ->0x80, ->0x00, ->ms-bit-in-byte==0 */ + andc %r7,%r7,%r8 /* ->0x80, ->0x00, ->0x00 */ + /* sort of like cmpb %r7,%r5,%r8 for %r8 being zero */ /* * If double words are different or contain zero, Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c =================================================================== --- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (revision 347003) +++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (working copy) @@ -959,7 +959,8 @@ * Map the entire KVA range into the SLB. We must not fault there. */ #ifdef __powerpc64__ - for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH) + i = 0; + for (va = virtual_avail; va < virtual_end && i<n_slbs-1; va += SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); #endif With this I do sometimes manage to boot. So, in this modified context, I've seen all 3 of: -r347003M stops sometimes at: vt_upgrade(&vt_consdev). . . -r347003M stops sometimes at: cpu_mp_unleash(0). . . [but with: "SMP: 4 CPUs found; 4 CPUs usable; 3 CPUs woken"] -r347003M boots and operates sometimes. (I did not do much with it booted, focusing on more boot attempts instead.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A0839E16-A654-4391-9A1B-A84D028A6CF7>
