From owner-freebsd-hackers  Mon Sep 21 00:49:20 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id AAA20949
          for freebsd-hackers-outgoing; Mon, 21 Sep 1998 00:49:20 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from word.smith.net.au (castles236.castles.com [208.214.165.236])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA20940
          for <hackers@FreeBSD.ORG>; Mon, 21 Sep 1998 00:49:15 -0700 (PDT)
          (envelope-from mike@word.smith.net.au)
Received: from word.smith.net.au (LOCALHOST [127.0.0.1])
	by word.smith.net.au (8.9.1/8.8.8) with ESMTP id AAA21394;
	Mon, 21 Sep 1998 00:54:28 -0700 (PDT)
	(envelope-from mike@word.smith.net.au)
Message-Id: <199809210754.AAA21394@word.smith.net.au>
X-Mailer: exmh version 2.0.2 2/24/98
To: Brett Glass <brett@lariat.org>
cc: hackers@FreeBSD.ORG
Subject: Re: Remember those spontaneous crashes I was getting? 
In-reply-to: Your message of "Mon, 21 Sep 1998 00:48:03 MDT."
             <199809210650.AAA00276@lariat.lariat.org> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 21 Sep 1998 00:54:28 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> Well, we still get one every day or two, at odd times. But I can ALWAYS
> make them happen by piping dump through gzip to ftp to a disk on a remote
> machine -- our usual backup procedure.
> 
> Anyway, when I first reported this crash, I was asked what message
> appeared. Unfortunately, it flew by so fast that I couldn't tell what it
> said! So, tonight, seeing that it was a slow night and no users were on, I
> swapped the kernel for one with the debugger enabled and started the backup
> procedure.
> 
> Sure enough, a crash. The screen said:
> 
> Fatal trap 9: general protection fault while in kernel mode
> 
> Instruction pointer = 0x8:0xf0176fb5
> Stack pointer = 0x10:0xf0199000

Are you 100% sure about these numbers?  The kernel stack pointer 
shouldn't be higher than the instruction pointer.  This looks like 
either corrupt code eating %esp or a CPU fault.

> Frame pointer = 0x10:0x0
> Code segment = base 0x0, limit 0xfffff, type 0x1b
>              = DPL 0, pres 1, def32 1, gran 1
> 
> Processor eflags = interrupt enabled, resume, IOPL = 0
> 
> Current process = Idle
> 
> Interrupt mask = 
> 
> kernel: type 9 trap, code = 0
> 
> Stopped at idle_loop_0x3d: jmp idle_loop

There's nothing illegal about this at all; this really looks like a 
memory read error (bad memory, CPU, cache or motherboard).  You might 
have received the GPF because the stack pointer is pointing into the 
kernel text segment (which it probably can't write to).

Corrupting the stack pointer (as opposed to corrupting the contents of 
the stack) is pretty difficult.  It's also very difficult to track 
down. 8(

> As I began to play with the debugger (I really didn't know the commands), I
> saw:
> wd0: interrupt timeout
> wd0: status 50<rdy,seekdone> error 0
> 
> ...which may not have meant anything, but then again....

It just means that you were in the middle of a disk operation, which 
subsequently timed out (because the debugger was running).

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message