Date: Tue, 14 Oct 2014 15:18:55 -0700 From: Mark Millard <markmi@dsl-only.net> To: Nathan Whitehorn <nwhitehorn@freebsd.org> Cc: Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: My PowerMac G5's no longer crash at boot: PowerMac G5 specific ofwcall changes with justifying evidence [important typos fixed] Message-ID: <3D4A76B3-431A-4C94-8747-70369A8A1764@dsl-only.net> In-Reply-To: <543D5ACD.20901@freebsd.org> References: <76F704FD-BB74-4439-8318-DB4C167B420F@dsl-only.net> <543B3828.8070806@freebsd.org> <9D9B0372-8D8F-4153-85B5-40066206EF67@dsl-only.net> <379AA7FC-98C9-48B9-92BB-60E134817AF1@dsl-only.net> <C614025F-6455-4929-8468-462E76079274@dsl-only.net> <A2AB9066-259B-4B7D-BDDC-D03AE5827E13@dsl-only.net> <CAHSQbTCKi_MBhERh6d=kX2y-=%2B2OzqpGM%2BN=ZEShi-kX2r8NPQ@mail.gmail.com> <543D5ACD.20901@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
For openfirmware: is %r3 on return any more then a failed vs. not flag = with a particular failed-value? Is there any way to validate that %r3 = values for non-failure look reasonable vs. not looking reasonable? (For = all I know %r3 could also be corrupt.) I do not have any documentation for the PowerMac G5 openfirmware API = that is in use or the associated ABI as far as I remember. I do not know = if it strictly followed Darwin's/Mac OS X's ABI on PowerMac G5's vs. if = there was some conversion going back and forth (as there is for FreeBSD, = at least for powerpc64). For openfirmware I derive properties from what = I see in FreeBSD's code (which has to be more explicit then when a = compiler's code generation happens to match at least large parts of an = ABI directly). As I vaguely-remember Apple did not use the TOC for Darwin's/Mac OS X's = ABI but FreeBSD does. If true I do not know what other differences that = there might be (even ignoring the 32 bit vs. 64 bit issues for the = kernels). But the point would be an existence proof of at least one = difference. My understanding is that %r1 was as in FreeBSD. I vaguely seem to remember that for Darwin/Mac OS X some register was = volatile in leaf functions but non-volatile otherwise, or at least when = nested functions were involved. And that brings to mind that the = condition code sets in cr might have had a mix of volatile and = non-volatile status despite being in one register? Did Darwin/Mac OS X = have something special for register usage for Thread-Specific Storage? = Position Independent Code? Indirect Calls? Frame Pointers? I may have some Darwin/Mac OS X information around but I doubt that it = is complete, especially for the 64-bit ABI or for privileged contexts. = For the 32-bit ABI (non-priviledged) I likely have the information about = the above possible ABI properties. I assume that openfirmware avoids the FPU and other such --but I do not = know. But it is privileged code. Are there any known sources of at least some of the information for the = the PowerMac G5 openfirmware ABI(s)? What are good references for the = FreeBSD PowerPC ABI(s) (32 bit and 64 bit, privileged vs. not)? [I cut off some of the older history.] =3D=3D=3D Mark Millard markmi at dsl-only.net On Oct 14, 2014, at 10:18 AM, Nathan Whitehorn <nwhitehorn at = freebsd.org> wrote: r1 *must be* preserved by the standard and for anything to work. It's = being corrupted somehow (Mark's comment about r3 is illuminating), and = if r1 is being corrupted, you can't rely on anything. I suspect it might = be an exception handling issue since it's non-deterministic, but it's = hard to tell. It could also be triggered by the way we've set up the OF = stack frame. It would be good to check if that makes sense. -Nathan On 10/14/14 09:53, Justin Hibbits wrote: > Interesting. Perhaps, instead of using %r1, and relying purely on the > stack we use yet another (non-volatile) register to hold the MSR. > Once we reload the MSR we can get back the saved registers, because > the stack will be valid again. >=20 > Nathan, thoughts? >=20 > - Justin >=20 > On Tue, Oct 14, 2014 at 9:14 AM, Mark Millard <markmi at dsl-only.net> = wrote: >> Additional notes from additional experiments... (So far from one G5.) >>=20 >> I got back trace, show registers, and my openfirmware-history list = going for failure reporting based on explicit before vs. after tests of = %r1 values. (Explicit breakpoint call for unequal, being careful to = save/restore %r3 around the call.) I filled several registers with = potentially interesting values that would otherwise have had zero as a = value (%r15-%r19, although %r15 is redundant with %r6 currently). >>=20 >> An interesting property resulted: every time %r1 had changed from = having the before-value (stack pointer value) %r1 instead ended up with = a value equal to what openfirmware put in %r3. >>=20 >> And more then that: For builds with the same ofwstk position the %r3 = value involved was fixed for the failures, for example when = 0x30400=3Dofwstk+0xfe0 (%r1 before) was reported %r3 and %r1 end up as = 0xd23450 for the failures. When 0x31400=3Dofwstk+0xfe0: %r3 and %r1 = ended up for failure as 0xd24450 instead. Yep: offset by the same amount = as ofwstk. >>=20 >> And I got one example where the openfirmware %r1-value-change failure = was instead much later in the boot, well after pmap_bootstrapped went = true: It was just after the message lines... >>=20 >> vgapci0: Boot video device ... >> pcib1: <IBM CPC9X5 Hypertransport tunnel> ... >>=20 >> with back trace (from OF_peer down): >>=20 >> .OF_peer+0x8c >> .cpcht_attach+0x884 >> .device_attach+0x3ac >> .device_probe_and_attach+0x3c >> .bus_generic_new_pass+0x12c >> .bus_generic_new_pass+0x114 >> .bus_generic_new_pass+0x114 (yep: listed twice) >> .bus_set_pass+0xc0 >> .root_bus_configure+0x14 >> .mi_startup+0x10c >> btext+0xbc >>=20 >> %r1 before: 0xc30400 ofwstk+0xfe0 >> %r1 after: 0xd23450 >> %r3 after: 0xd23450 >> FreeBSD msr to restore: 0x9000000000001032 >> ofmsr[0] to restore: 0x1000000000003030 >>=20 >> The same after-openfirmware %r1 and %r3 values that had been showing = up for the before-copyright examples of ofwcall failures. >>=20 >> And note that it again was a peer request. All the ofwcall-tied = boot-failures have been for peer requests as far as I remember. >>=20 >> I later did some experiments where I had it report but not stop when = the after-value was different from the before-value for %r1. When this = happened for these types of tests it seem to be an isolated example: = later calls normally have the stack pointer value still in %r1 after = openfirmware returns. In more detail: At most one report was made for = such a boot, the rest of the boot went fine. (Of course to get that far = my hacked ofwcall code avoids using the after-openfirmware %r1 value to = extract the 3 saved values to be restored from the bottom of ofwstk.) >>=20 >>=20 >>=20 >> I was not successful at using "capture on" in DDB for this early-boot = context. (It hangs things after the first report.) So I've been limited = to one screen's report and only when I have it stop at the end of the = report (so it does not scroll away). (No input to DDB available that = early.) Otherwise the information just scrolls by rather quickly for = reading any detail. Still it was useful to see that other reports were = not produced after the first (when there was a first). (I can not claim = multiple are impossible. It just appears at least infrequent.) >>=20 >> I have not yet investigated making analogous powerpc/GENERIC code and = builds. >>=20 >> Nor have I dealt with having it report more detail about the peer = requests that fail. >>=20 >> Nor have I seen examples of what "not failing/%r1-unchanged" looks = like overall. >>=20 >> I still have no examples of unstable/incomplete initialization(s) or = race condition(s) to explain why both ways can and do occur from one = attempt to the next --or that difference peer requests in the sequence = can be where the problem happens.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D4A76B3-431A-4C94-8747-70369A8A1764>