Date: Mon, 21 Sep 1998 00:54:28 -0700 From: Mike Smith <mike@smith.net.au> To: Brett Glass <brett@lariat.org> Cc: hackers@FreeBSD.ORG Subject: Re: Remember those spontaneous crashes I was getting? Message-ID: <199809210754.AAA21394@word.smith.net.au> In-Reply-To: Your message of "Mon, 21 Sep 1998 00:48:03 MDT." <199809210650.AAA00276@lariat.lariat.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> Well, we still get one every day or two, at odd times. But I can ALWAYS > make them happen by piping dump through gzip to ftp to a disk on a remote > machine -- our usual backup procedure. > > Anyway, when I first reported this crash, I was asked what message > appeared. Unfortunately, it flew by so fast that I couldn't tell what it > said! So, tonight, seeing that it was a slow night and no users were on, I > swapped the kernel for one with the debugger enabled and started the backup > procedure. > > Sure enough, a crash. The screen said: > > Fatal trap 9: general protection fault while in kernel mode > > Instruction pointer = 0x8:0xf0176fb5 > Stack pointer = 0x10:0xf0199000 Are you 100% sure about these numbers? The kernel stack pointer shouldn't be higher than the instruction pointer. This looks like either corrupt code eating %esp or a CPU fault. > Frame pointer = 0x10:0x0 > Code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > > Processor eflags = interrupt enabled, resume, IOPL = 0 > > Current process = Idle > > Interrupt mask = > > kernel: type 9 trap, code = 0 > > Stopped at idle_loop_0x3d: jmp idle_loop There's nothing illegal about this at all; this really looks like a memory read error (bad memory, CPU, cache or motherboard). You might have received the GPF because the stack pointer is pointing into the kernel text segment (which it probably can't write to). Corrupting the stack pointer (as opposed to corrupting the contents of the stack) is pretty difficult. It's also very difficult to track down. 8( > As I began to play with the debugger (I really didn't know the commands), I > saw: > wd0: interrupt timeout > wd0: status 50<rdy,seekdone> error 0 > > ...which may not have meant anything, but then again.... It just means that you were in the middle of a disk operation, which subsequently timed out (because the debugger was running). -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809210754.AAA21394>