From owner-freebsd-alpha Thu Oct 24 12: 1:36 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A433737B401 for ; Thu, 24 Oct 2002 12:01:30 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B26443E6A for ; Thu, 24 Oct 2002 12:01:30 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id PAA21309; Thu, 24 Oct 2002 15:01:29 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g9OJ0x311005; Thu, 24 Oct 2002 15:00:59 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15800.17259.397652.862956@grasshopper.cs.duke.edu> Date: Thu, 24 Oct 2002 15:00:59 -0400 (EDT) To: Fred Clift Cc: Subject: Re: debugging around machine-checks... In-Reply-To: <20021023113324.U98807-100000@vespa.dmz.orem.verio.net> References: <15798.56033.844389.549256@grasshopper.cs.duke.edu> <20021023113324.U98807-100000@vespa.dmz.orem.verio.net> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Fred Clift writes: > by hand I run dumpon -v /dev/da0b (which is my swap partition, twice what > I have of ram in size) > > and then I do my fiddling with XFree86 that gives me the machine-check and > I end up at the SRM prompt. At this point, I know that just booting will > fail. I have to power-cycle the box and when it comes back up, savecore > either doesn't find anything, or isn't being run by the rc scripts. Once > I get a chance to log in /var/crash has only minfree in it... > That *should* work.. > Should I be doing something else? > > I just looked in /var/log/mesages and saw no evidence of crashdumps being > written (ie dumping to.... or dump 254 253 252 251... etc). If you powercyle, the message buffer is lost. When I would crash X on an old miata, 1/2 the time I'd get a 'machine check in pal mode' -- this doesn't even get caught by the OS. However, if you're seeing the message below, I do not understand why you're not getting a crashdump. In any case, since the problem is probably with the X server (based on the mesage below), a crashdump would not help you. > > > > > Can't you use the program counter from the panic output as a start? > > If its in the X server, there should be a PC from userspace. > > (see disclaimer below) > > > > So can you interpret this for me then - honestly I just dont know what all > the fields represent -- I should probably just go read the source code and > see :) > > Oct 8 06:42:24 liron /kernel: unexpected machine check: > Oct 8 06:42:24 liron /kernel: > Oct 8 06:42:24 liron /kernel: mces = 0x1 > Oct 8 06:42:24 liron /kernel: vector = 0x660 > Oct 8 06:42:24 liron /kernel: param = 0xfffffc0000006068 > Oct 8 06:42:24 liron /kernel: pc = 0x1604006ac > Oct 8 06:42:24 liron /kernel: ra = 0x12006cb10 > Oct 8 06:42:24 liron /kernel: curproc = 0xfffffe0009910200 > Oct 8 06:42:24 liron /kernel: pid = 90765, comm = XFree86 > Oct 8 06:42:24 liron /kernel: > Oct 8 06:42:24 liron /kernel: panic: machine check > > > The program counter is pc? so I should be able to, with gdb and a > debug-version of XFree86, figure out what code this is? Yes, except its in a shared lib, or other dynamically loaded text. I don't know how you could debug that without a cordump. The ra (return address) is at least somewhere in the main text of the program (not a shared lib). <...> > Your explanation is helpful, and perhaps I'll try your suggestion of > turning userland machine checks into sigbus or something - I'm sure I'm > just begging for trouble here, but at least this isn't a production > machine that other people depend on :). > > To send a signal to a process from within the kernel, it seems I just call > > psignal(pid, signo) > > - is this right? > More or less. I think trapsignal may be more correct. > Thanks very much for your information - looks like a little check in > machine_check() in interrupt.c will do pretty much what I want - perhaps > I'll make sure that my hack only works on processes who's name starts > with 'X' or something just to be safe.... Good luck to you!! Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message