Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Jan 2011 14:58:52 -0800
From:      Garrett Cooper <gcooper@FreeBSD.org>
To:        Mark Saad <nonesuch@longcount.org>
Cc:        hackers@freebsd.org
Subject:   Re: With out ddb and kdb set 7.3-RELEASE amd64 does not boot.
Message-ID:  <AANLkTinbdf96n5-1gkALFxzDdUAJpxXWtTw%2BqyiBShsv@mail.gmail.com>
In-Reply-To: <AANLkTi=ZZm0urtT%2BTqrvnQEDW3n-Bmvobi74BzMTKm5r@mail.gmail.com>
References:  <AANLkTikEmdDMsxRp8fUPOw=mXnL4TMNJ8zCkVcdvk7m0@mail.gmail.com> <AANLkTinKew-RjN_026TpO%2BsXjXHt%2BAGNxqjAPyhfOsf8@mail.gmail.com> <AANLkTimHOBRvhpXCKZo-ddYHwaZ1M%2B2T9AAXNJ81LdR0@mail.gmail.com> <AANLkTinoygySJRRM0UpQ1g%2BGUXxsqX5%2BnBnJ6vxZ2VCp@mail.gmail.com> <AANLkTin4Gdb9L1GzNH87fDz1z7jVkK-qGmwzxP%2BbG=yV@mail.gmail.com> <AANLkTi=ZZm0urtT%2BTqrvnQEDW3n-Bmvobi74BzMTKm5r@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 7, 2011 at 2:36 PM, Mark Saad <nonesuch@longcount.org> wrote:
> On Fri, Jan 7, 2011 at 5:27 PM, Garrett Cooper <gcooper@freebsd.org> wrot=
e:
>> On Fri, Jan 7, 2011 at 2:26 PM, Garrett Cooper <gcooper@freebsd.org> wro=
te:
>>> On Fri, Jan 7, 2011 at 2:22 PM, Mark Saad <nonesuch@longcount.org> wrot=
e:
>>>> On Fri, Jan 7, 2011 at 4:56 PM, Garrett Cooper <gcooper@freebsd.org> w=
rote:
>>>>> On Fri, Jan 7, 2011 at 1:20 PM, Mark Saad <nonesuch@longcount.org> wr=
ote:
>>>>>> Hello hackers@,
>>>>>> =A0I have a good question that I cant find an answer for. I believe
>>>>>> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-b=
it
>>>>>> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: pag=
e
>>>>>> fault while in kernel mode " . The hardware works fine in 7.2-RELEAS=
E
>>>>>> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 .
>>>>>>
>>>>>> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using t=
he
>>>>>> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if th=
is
>>>>>> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC
>>>>>> kernel using patches sources and tried to boot and I got the same
>>>>>> crash.
>>>>>>
>>>>>> =A0Next I rebuilt the kernel with KDB and DDB to see if I could get =
a
>>>>>> core-dump of the system. I also set loader.conf to
>>>>>>
>>>>>> kernel=3D"kernel.DEBUG"
>>>>>> kern.dumpdev=3D"/dev/da0s1b"
>>>>>>
>>>>>> Next I pxebooted =A0the box and the system does not crash on boot up=
, it
>>>>>> will easily load a nfs root and work fine. So I copied my debug
>>>>>> kernel, and loader.conf to the local disk and rebooted and it boots
>>>>>> fine from the local disk .
>>>>>>
>>>>>> Rebooting the server and running off the local disks and debug kerne=
l,
>>>>>> I cant find any issues.
>>>>>>
>>>>>> Reboot the box into a GENERIC 7.3-RELEASE-p4 kernel and it crashes
>>>>>>
>>>>>> With this error
>>>>>>
>>>>>> Fatal trap 12: page fault while in kernel mode
>>>>>> cpuid =3D 0; apic id =3D 00
>>>>>> fault virtual address =A0 =3D 0x0
>>>>>> fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D supervisor write data=
, page not present
>>>>>> instruction pointer =A0 =A0 =3D 0x8:0xffffffff800070fa
>>>>>> stack pointer =A0 =A0 =A0 =A0 =A0 =A0=3D 0x10:0xffffffff8153cbe0
>>>>>> frame pointer =A0 =A0 =A0 =A0 =A0 =A0=3D 0x10:0xffffffff8153cc50
>>>>>> code segment =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x=
1b
>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0=
, pres 1, long 1, def32 0, gran 1
>>>>>> processor eflags =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D =
0
>>>>>> current process =A0 =A0 =A0 =3D 0 (swapper)
>>>>>> [thread pid 0 tid 100000 ]
>>>>>> Stopped at =A0 =A0 =A0bzero+0xa: =A0 =A0 repe stosq =A0 =A0 =A0 %es:=
(%rdi)
>>>>>>
>>>>>>
>>>>>> What do I do , has anyone else seen anything like this ?
>>>>>
>>>>> =A0 =A0What are the messages before that on the kernel console and wh=
at
>>>>> are your drivers loaded on a stable system?
>>>>> Thanks,
>>>>> -Garrett
>>>>>
>>>> Garrett
>>>> =A0The last 4 lines of the verbose boot up of the generic kernel are
>>>> all from sio1
>>>
>>> =A0 =A0Is sio1 pointing to a generic UART, or is it something more
>>> special like the HP lights-out SOL interface?
>>> =A0 =A0Simple test might be to disable the sio/uart driver in the kerne=
l
>>> and see if things worked.
>>
>> Or easier yet, disable the port in the BIOS and comment out all of the
>> sio/uart references in device.hints.
>>
> Garrett
> =A0Interesting commenting out the sio lines in device.hints fixes it,
> did device.hints
> or the sio driver change some how from 7.2 to 7.3 ?

    Given that the messages changed, it's probably a driver bug (my
guess is either isa or sio) that was introduced by accident where a
device is failing to probe and isn't properly releasing resources
and/or notifying that the driver couldn't attach at the kernel level.
re-CCing the list, but you might want to ask imp@ for some input; I
might have time to look at this further tonight, but I would check and
see where `prob failed tests(s):' is being printed out, and analyze
the code if possible for missing return codes, or put breakpoints in
ddb there so you can trace the call stack and figure out more info,
etc -- that will help point you at the culprit better.
Thanks,
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinbdf96n5-1gkALFxzDdUAJpxXWtTw%2BqyiBShsv>