Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 8 Feb 2015 01:16:31 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: HEADS UP: powerpc64 kernel format change [booted a PowerMac G5 quad-core][__syncicache is running at the time of the crashes]
Message-ID:  <E527514A-96F6-4794-8F03-504E51EC8CCB@dsl-only.net>
In-Reply-To: <449E0C48-B57D-4873-B2E7-BC217D891897@dsl-only.net>
References:  <335C8DCD-33DF-4430-A0FA-77669C513C61@dsl-only.net> <449E0C48-B57D-4873-B2E7-BC217D891897@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
I've narrowed down greatly where the crashes happen, which need not be =
where root-cause is: that could be earlier.

In the following code [I'm using 10.1-RELEASE-p5 for reference here]

$ svnlite diff sys/boot/ofw/libofw/ppc64_elf_freebsd.c=20
Index: sys/boot/ofw/libofw/ppc64_elf_freebsd.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/boot/ofw/libofw/ppc64_elf_freebsd.c	(revision 277808)
+++ sys/boot/ofw/libofw/ppc64_elf_freebsd.c	(working copy)
@@ -59,7 +59,11 @@
 	 * be done by the kernel after relocation.
 	 */
 	if (!strcmp((*result)->f_type, "elf kernel"))
+{
+printf("ppc64_ofw_elf_loadfile before __syncicache\n");
 		__syncicache((void *) (*result)->f_addr, =
(*result)->f_size);
+printf("ppc64_ofw_elf_loadfile after __syncicache\n");
+}
 	return (0);
 }

for a directly-bootable (no crash) kernel-build both printf's are =
displayed. (The above code part of /boot/loader .)

But for the kernels that I build that fail to directly boot only the =
first of the two printf's is displayed when a direct boot is attempted: =
Openfirmware's notice with %SRR0 and %SRR1 shows up after that instead =
of the text from the second printf.

Based on that much it looks like the crash is either in evaluating the =
arguments to __syncicache or happens during __syncicache's execution, =
not after.



Changing the first printf to something like the sequence:

printf("ppc64_ofw_elf_loadfile before __syncicache\n");
printf("(void*)result: %p\n",(void*)result);
printf("(void*)(*result): %p\n",(void*)(*result));
printf("(void*)(*result)->f_addr: %p\n",(void*)(*result)->f_addr);
printf("(*result)->f_size : 0x%lx\n",(*result)->f_size);

Shows that all of the stages print before the crash happens, answering =
the question about evaluation of the arguments: there is no problem =
evaluating them.

So the crashes are strictly during __syncicache's activity.

Only (*result)->f_size varies between the 3 examples that I've used:

10.1-RELEASE-p5 variant without VERBOSE_SYSINIT (and BOOTVERBOSE, =
BOOTHOWTO). (boots fine)
10.1-RELEASE-p5 variant with VERBOSE_SYSINIT (and BOOTVERBOSE, =
BOOTHOWTO). (crashes)
10.1-STABLE (-r278028) variant without VERBOSE_SYSINIT (and BOOTVERBOSE, =
BOOTHOWTO). (crashes)

What the above printf's reported was:

(void*)result:    0x1c35b48
(void*)(*result):   0x1ebc0
(*result)->f_addr: 0x100000

10.1-RELEASE-p5 variant without VERBOSE_SYSINIT: (*result)->f_size:      =
  0x1014b80
10.1-RELEASE-p5 variant with VERBOSE_SYSINIT: (*result)->f_size:         =
  0x1014bb0
10.1-STABLE (-r278028) variant without VERBOSE_SYSINIT: =
(*result)->f_size: 0x10175d0

(Listed in increasing order.)

As I remember for the last two the crash report listed %SRR0: 0x1c2785c =
for both. The "without VERBOSE_SYSINIT" one above does not crash and for =
it the "ppc64_ofw_elf_loadfile after __syncicache" message shows up as =
it should.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2015-Feb-7, at 09:43 AM, Mark Millard <markmi at dsl-only.net> wrote:

Correction to earlier Email: VERBOSE_SYSINIT with DDB (and GDB) all =
enabled (indirectly booted via using kernel10.1RE) got 0x1c277ec for the =
%SRR0 value, not 0x1c277fc. So slightly different than Kernel10.1S's =
0x1c277fc (for this 10.1-STABLE variant). (I looked at the wrong notes =
when composing the original Email.)

More comparisons of kernel build options:

VERBOSE_SYSINIT enabled with DDB (and GDB) disabled still has the =
booting problem for my 10.1-RELEASE-p5 variant. It also still has the =
0x1c277ec for the %SRR0 value.


For VERBOSE_SYSINIT disabled (DDB and GDB enabled) directly booted...

Preloaded elf kernel "/boot/kernel/kernel" at 0x1106000.
...
real memory  =3D 17152118784 (16357 MB)
available KVA =3D 7222611967 (6888 MB)
Physical memory chunk(s):
0x0000000000024000 - 0x00000000000fffff, 901120 bytes (220 pages)
0x0000000001115000 - 0x00000000017fffff, 7254016 bytes (1771 pages)
0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)
0x0000000004cbd000 - 0x000000000fffffff, 187969536 bytes (45891 pages)
0x0000000020000000 - 0x000000007f5effff, 1600061440 bytes (390640 pages)
0x0000000100000000 - 0x0000000466827fff, 14604730368 bytes (3565608 =
pages)
0x0000000200000000 - 0x00000001ffffffff, 0 bytes (0 pages)
0x0000000300000000 - 0x00000002ffffffff, 0 bytes (0 pages)
0x0000000400000000 - 0x00000003ffffffff, 0 bytes (0 pages)
avail memory =3D 16374190080 (15615 MB)

So 0x1c277ec is between the two:

0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)

(But I do not know what most of the regions and holes are supposed to =
be.)

VERBOSE_SYSINIT, DDB, and GDB enabled but indirectly booted via =
kernel10.1RE (via /boot/loader.conf's kernel=3D"kernel10.1RE"), =
stopping, unloading, then doing "boot kernel":

Preloaded elf kernel "/boot/kernel/kernel" at 0x1116000.
...
real memory  =3D 17152118784 (16357 MB)
available KVA =3D 7222611967 (6888 MB)
Physical memory chunk(s):
0x0000000000024000 - 0x00000000000fffff, 901120 bytes (220 pages)
0x0000000001105000 - 0x0000000001114fff, 65536 bytes (16 pages)
0x0000000001125000 - 0x00000000017fffff, 7188480 bytes (1755 pages)
0x0000000001814000 - 0x0000000001bfffff, 4112384 bytes (1004 pages)
0x0000000001c3d000 - 0x0000000001c3cfff, 0 bytes (0 pages)
0x0000000004cbd000 - 0x000000000fffffff, 187969536 bytes (45891 pages)
0x0000000020000000 - 0x000000007f5effff, 1600061440 bytes (390640 pages)
0x0000000100000000 - 0x0000000466827fff, 14604730368 bytes (3565608 =
pages)
0x0000000200000000 - 0x00000001ffffffff, 0 bytes (0 pages)
0x0000000300000000 - 0x00000002ffffffff, 0 bytes (0 pages)
0x0000000400000000 - 0x00000003ffffffff, 0 bytes (0 pages)
avail memory =3D 16374190080 (15615 MB)


=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On 2015-Feb-7, at 03:49 AM, Mark Millard <markmi at dsl-only.net> wrote:

Nathan, you had the below written about my problems with booting my =
builds of, say, 10.1-STABLE  (kernel=3D"kernel10.1S" in =
/boot/loaderl.conf) without involving the kernel from my build of =
10.1-RELEASE-p5 (kernel=3D"kernel10.1RE" or sometimes kernel=3D"kernel" =
in /boot/loader.conf), where kernel=3D"kernel10.1RE" in =
/boot/loader.conf boots just fine...

> So this has to be some kind of icache issue. If you unload and reload=20=

> the *same* kernel, does it also help?
> -Nathan

(Part of the evidence was: Using kernel=3D"kernel10.1RE" in =
/boot/loader.conf, stopping at the 10sec prompt, unloading, and doing =
"boot kernel 10.1S" lets my 10.1-STABLE builds boot that will not boot =
directly.)


Well I've got a little more information from a different direction: A =
way to create the problem when building my 10.1-RELEASE-p5 kernel is to =
enable VERBOSE_SYSINIT. More specifically the comparison/contrast I've =
done so far is...



I added the following 3 lines to my GENERIC64vtsc for my 10.1-RELEASE-p5 =
source tree (no other changes elsewhere at all)

options 	VERBOSE_SYSINIT
options 	BOOTVERBOSE=3D1
options 	BOOTHOWTO=3DRB_VERBOSE

and rebuilt kernel the via KERNCONF=3DGENERIC64vtsc INSTKERNNAME=3Dkernel =
the resulting kernel load fails if referenced by /boot/loader.conf via =
kernel=3D"kernel" line. The %SRR0 address value listed is the same as =
for kernel10.1S: 1c277fc. But booting using kernel=3D"kernel10.1RE" in =
/boot/loader.conf, stopping at the 10sec wait, unloading, and typing =
"boot kernel" boots fine --just like "boot kernel10.1S".

Note: GENERIC64vtsc has option DDB enabled (and GBD too). (This is =
associated my with my information gathering for early G5 boot =
crashes/hangups.)

Note: This is the first time I've ever tried any of those 3 options. My =
kernel10.1S build was not based on them.

Then I changed the 3 lines by just commenting out the first of the 3 =
that I had added

#options 	VERBOSE_SYSINIT
options 	BOOTVERBOSE=3D1
options 	BOOTHOWTO=3DRB_VERBOSE

and rebuilt via KERNCONF=3DGENERIC64vtsc INSTKERNNAME=3Dkernel again. =
The resulting /boot/kernel/... boots just fine when kernel=3D"kernel" is =
used in /boot/loader.conf : no need for using kernel10.1RE or for =
stopping to do anything special.



=3D=3D=3D
Mark Millard
markmi at dsl-only.net






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E527514A-96F6-4794-8F03-504E51EC8CCB>