From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 7 22:58:54 2011 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B180106566B for ; Fri, 7 Jan 2011 22:58:54 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9BF708FC0C for ; Fri, 7 Jan 2011 22:58:53 +0000 (UTC) Received: by wyf19 with SMTP id 19so18066397wyf.13 for ; Fri, 07 Jan 2011 14:58:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=fEln+nJBOEAtMvRVExUlcgDI+Ph+HRGtgJU40OpZOE8=; b=G1VDOJaBwID2IBVId9SLPdMIThDy1ulgb2QFWEooKTjj0lUVKX+sCeER48Ny6/QY6v FEN0yainpXy3tGSMVxAqhW62bvYD0gzx0i7C2pnG+MJ5vRHTCYI9IM0bwqP+6k6vrGda Rk7irm9ODHhjh/8AZ/ZS2abA0fqrYyEGLmTUg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=IHLELC+zZIJHmR+/lVwpttUH1soZElyxblooV2nE5SI8z0pRBIfiyEy8aG8w3u0xK+ jF7f8HTpn/bmhlGaInsZALM3UM1cctvqvhUjA0AH5751LHfDeyZ58N3aTVWWjIZjr5Bz EoQy1/MpzLIROyzRwedP+o1lHOqtjVbEq2DG0= MIME-Version: 1.0 Received: by 10.227.69.201 with SMTP id a9mr16521906wbj.24.1294441132417; Fri, 07 Jan 2011 14:58:52 -0800 (PST) Sender: yanegomi@gmail.com Received: by 10.216.254.226 with HTTP; Fri, 7 Jan 2011 14:58:52 -0800 (PST) In-Reply-To: References: Date: Fri, 7 Jan 2011 14:58:52 -0800 X-Google-Sender-Auth: QIhFY6h1s-9V8nx5fcOO9XTyPzs Message-ID: From: Garrett Cooper To: Mark Saad Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: hackers@freebsd.org Subject: Re: With out ddb and kdb set 7.3-RELEASE amd64 does not boot. X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Jan 2011 22:58:54 -0000 On Fri, Jan 7, 2011 at 2:36 PM, Mark Saad wrote: > On Fri, Jan 7, 2011 at 5:27 PM, Garrett Cooper wrot= e: >> On Fri, Jan 7, 2011 at 2:26 PM, Garrett Cooper wro= te: >>> On Fri, Jan 7, 2011 at 2:22 PM, Mark Saad wrot= e: >>>> On Fri, Jan 7, 2011 at 4:56 PM, Garrett Cooper w= rote: >>>>> On Fri, Jan 7, 2011 at 1:20 PM, Mark Saad wr= ote: >>>>>> Hello hackers@, >>>>>> =A0I have a good question that I cant find an answer for. I believe >>>>>> found a kernel bug in 7.3-RELEASE that prevents me from booting 64-b= it >>>>>> kernels on HP's DL360 G4p . The kernel dies with "Fatal trap 12: pag= e >>>>>> fault while in kernel mode " . The hardware works fine in 7.2-RELEAS= E >>>>>> amd64, 7.1-RELEASE amd64, and 6.4-RELEASE amd64 . >>>>>> >>>>>> In 7.3-RELEASE amd64 I can not boot from cd or pxe correctly using t= he >>>>>> stock 7.3-RELEASE amd64 kernel however i386 works fine. To see if th= is >>>>>> issue was some how fixed in 7.3-RELEASE-p4 amd64 I rebuilt a GENERIC >>>>>> kernel using patches sources and tried to boot and I got the same >>>>>> crash. >>>>>> >>>>>> =A0Next I rebuilt the kernel with KDB and DDB to see if I could get = a >>>>>> core-dump of the system. I also set loader.conf to >>>>>> >>>>>> kernel=3D"kernel.DEBUG" >>>>>> kern.dumpdev=3D"/dev/da0s1b" >>>>>> >>>>>> Next I pxebooted =A0the box and the system does not crash on boot up= , it >>>>>> will easily load a nfs root and work fine. So I copied my debug >>>>>> kernel, and loader.conf to the local disk and rebooted and it boots >>>>>> fine from the local disk . >>>>>> >>>>>> Rebooting the server and running off the local disks and debug kerne= l, >>>>>> I cant find any issues. >>>>>> >>>>>> Reboot the box into a GENERIC 7.3-RELEASE-p4 kernel and it crashes >>>>>> >>>>>> With this error >>>>>> >>>>>> Fatal trap 12: page fault while in kernel mode >>>>>> cpuid =3D 0; apic id =3D 00 >>>>>> fault virtual address =A0 =3D 0x0 >>>>>> fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D supervisor write data= , page not present >>>>>> instruction pointer =A0 =A0 =3D 0x8:0xffffffff800070fa >>>>>> stack pointer =A0 =A0 =A0 =A0 =A0 =A0=3D 0x10:0xffffffff8153cbe0 >>>>>> frame pointer =A0 =A0 =A0 =A0 =A0 =A0=3D 0x10:0xffffffff8153cc50 >>>>>> code segment =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x= 1b >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0= , pres 1, long 1, def32 0, gran 1 >>>>>> processor eflags =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D = 0 >>>>>> current process =A0 =A0 =A0 =3D 0 (swapper) >>>>>> [thread pid 0 tid 100000 ] >>>>>> Stopped at =A0 =A0 =A0bzero+0xa: =A0 =A0 repe stosq =A0 =A0 =A0 %es:= (%rdi) >>>>>> >>>>>> >>>>>> What do I do , has anyone else seen anything like this ? >>>>> >>>>> =A0 =A0What are the messages before that on the kernel console and wh= at >>>>> are your drivers loaded on a stable system? >>>>> Thanks, >>>>> -Garrett >>>>> >>>> Garrett >>>> =A0The last 4 lines of the verbose boot up of the generic kernel are >>>> all from sio1 >>> >>> =A0 =A0Is sio1 pointing to a generic UART, or is it something more >>> special like the HP lights-out SOL interface? >>> =A0 =A0Simple test might be to disable the sio/uart driver in the kerne= l >>> and see if things worked. >> >> Or easier yet, disable the port in the BIOS and comment out all of the >> sio/uart references in device.hints. >> > Garrett > =A0Interesting commenting out the sio lines in device.hints fixes it, > did device.hints > or the sio driver change some how from 7.2 to 7.3 ? Given that the messages changed, it's probably a driver bug (my guess is either isa or sio) that was introduced by accident where a device is failing to probe and isn't properly releasing resources and/or notifying that the driver couldn't attach at the kernel level. re-CCing the list, but you might want to ask imp@ for some input; I might have time to look at this further tonight, but I would check and see where `prob failed tests(s):' is being printed out, and analyze the code if possible for missing return codes, or put breakpoints in ddb there so you can trace the call stack and figure out more info, etc -- that will help point you at the culprit better. Thanks, -Garrett