Date: Wed, 23 Oct 2002 13:22:41 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Fred Clift <fclift@verio.net> Cc: freebsd-alpha@freebsd.org Subject: Re: debugging around machine-checks... Message-ID: <15798.56033.844389.549256@grasshopper.cs.duke.edu> In-Reply-To: <20021023110134.Q98807-100000@vespa.dmz.orem.verio.net> References: <20021023110134.Q98807-100000@vespa.dmz.orem.verio.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Fred Clift writes: > > Ok -- I'm not terribly alpha proficitent - in fact, the one alpha that I > run is just a home-server - little more than a toy (mp3 server, print > server and relatively secure ssh enpoint from the outside world). > > Could someone explain exactly what is going on when a machine-check > happens? Is this done by the machine firmware or something? It seems Yes. A machine check is the highest priority interrupt. It occurs when something seriously bad happens. Like an uncorrectable memory parity error, or a rogue application or kernel fondling device memory that does not belong to it. In fact, on older alphas, that's how we probe the PCI bus. We tell the machine check logic to expect a machine check, and to just clear it, rather crashing. We then read from a device which may not exist. If we get a machine check, then the device wasn't there. (see sys/alpha/alpha/interrupt.c:badaddr_read()). 21264s and newer are more forgiving about device memory -- they are like a PC, and will throw away writes to devices which don't exist, and return -1 for reads. > that FreeBSD is instantenously interrupted when a machine check happens > and that I dont get crash-dumps. Hmm.. I haven't used a machine check generating alpha in a while, but from the code in interrupt.c, it looks like it *should* give you a crashdump. > Some of you may recall that I've been playing around with XFree86 V4 on > this box - it would be exceptionally helpful if I got usable crash-dumps > instead of machine checks when things got wierd. As it is, debugging the > X server is pretty much impossible (for me) because of this. > > What I've done is build all of the X distribution with debugging symbols > in and then I start the X server from gdb and put in 10 break points near > where I think things will be happening. Eventually, I get a machine check > and if I'm lucky, I remember where the last breakpoint that I hit was so > that after a reboot, I can kind of start back in that neighborhood. Can't you use the program counter from the panic output as a start? If its in the X server, there should be a PC from userspace. (see disclaimer below) > X is hard enough to debug by itself without this inconvienence. It seems > that whatever is making it machine-check should be things that could be > fixed in the kernel, at which point, my debugging of the X server could > then continue.Then when X dumps core I can just restart X rather than wait > for a reboot/fsck. > > Am I way off here? I seem to have read somewhere that there is something > you can do to fend off machine-checks so that you can get a proper > crash-dump? What is the mechanism that causes the checks and how bad > would it be for the system to do something equivalent to maksing these > events out (or whatever you'd do to get them to not happen?). > Look at alpha/alpha/interrupt.c:badaddr_read(). If you're feeling really lucky, you could add code to send the appropriate signal (sigbus?) if the PC is in a userland app. The problem with this is that machine checks are somewhat asynchronous, and I'm not sure the PC at the time of the fault corresponds to the PC that actually caused the fault. (that's why there are so many memory barriers all over the pci probing and baddaddr code). Good luck. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15798.56033.844389.549256>