From owner-freebsd-stable  Sat Apr 13  1:12:42 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from server.rucus.ru.ac.za (server.rucus.ru.ac.za [146.231.115.1])
	by hub.freebsd.org (Postfix) with SMTP id D4F2C37B404
	for <freebsd-stable@freebsd.org>; Sat, 13 Apr 2002 01:12:32 -0700 (PDT)
Received: (qmail 30507 invoked from network); 13 Apr 2002 08:12:28 -0000
Received: from shell-fxp1.rucus.ru.ac.za (HELO shell.rucus.ru.ac.za) (10.0.0.1)
  by server.rucus.ru.ac.za with SMTP; 13 Apr 2002 08:12:28 -0000
Received: (qmail 50741 invoked by uid 1040); 13 Apr 2002 08:12:28 -0000
Date: Sat, 13 Apr 2002 10:12:28 +0200
From: =?iso-8859-1?Q?David_Sieb=F6rger?= <drs@rucus.ru.ac.za>
To: freebsd-stable@freebsd.org
Subject: Re: Understanding a crash dump
Message-ID: <20020413101228.A48671@rucus.ru.ac.za>
References: <20020412224856.A20583@rucus.ru.ac.za> <20020413130311.J47408@wantadilla.lemis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <20020413130311.J47408@wantadilla.lemis.com>; from grog@FreeBSD.org on Sat, Apr 13, 2002 at 01:03:11PM +0930
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

On Sat 2002-04-13 (13:03), Greg 'groggy' Lehey wrote:
> On Friday, 12 April 2002 at 22:48:56 +0200, David Siebörger wrote:
> > Can anyone shed any light on this?  Is there anything useful still to
> > be gleaned from this dump, or is there anything that I should do to
> > get better information in future?
> 
> Possibly you could get something more out of it, but it would be heavy
> work.  I have some macros in /usr/src/sys/modules/vinum which are
> really intended for debugging Vinum (thus the location), but which
> might help here.  See http://www.vinumvm.org/vinum/how-to-debug.html
> for a general description of how to use them, though you won't need
> the Vinum-specific parts here.  In particular, though, you should find
> a ps command which may show you what was running at the time.  I say
> "may", because you appear to have memory corruption here.

It seems to me that there was no process running at the time of the
first panic.  The first panic report said:

current process         = Idle
interrupt mask          = 

I loaded up the Vinum gdb macros, and ran 'ps'.  Every process listed
had stat = 3 (SSLEEP).

[snip]
> > #6  0xc01cb1a4 in acquire_lock (lk=0xc02a6e3c)
> >     at /usr/build/src/sys/ufs/ffs/ffs_softdep.c:266
> 
> This is the second trap.  It might give you some idea about what
> caused the first trap.

Digging a little deeper here, again, I'm drawn to conclude that there
was no process running.

(kgdb) up 6
#6  0xc01cb1a4 in acquire_lock (lk=0xc02a6e3c) at /usr/build/src/sys/ufs/ffs/ffs_softdep.c:266
266             lk->lkt_held = CURPROC->p_pid;
(kgdb) p curproc
$1 = 0x0

CURPROC is #defined as curproc in ffs_softdep.c.

So what caused the second panic was that this function tried to
dereference a null pointer.  It assumes (probably correctly :) ) that
there should be a running process.

[snip]
> > #15 0xc0245753 in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = 0,
> >       tf_esi = -1040807936, tf_ebp = -1071042748, tf_isp = -1071042780,
> >       tf_ebx = -1036232128, tf_edx = -6, tf_ecx = 10, tf_eax = -6, tf_trapno = 12,
> >       tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66055, tf_esp = -1040807936,
> >       tf_ss = 40}) at /usr/build/src/sys/i386/i386/trap.c:458
> > #16 0x0 in ?? ()
> 
> Somehow you have ended up trying to execute code at address 0.  This
> smells of a smashed stack.  I don't think that it would be an indirect
> function call, since otherwise I'd expect the backtrace to continue.
> You could find out what the current process is (ps will show it) and
> use the btp macro to show a backtrace which may show more.  Usage is
> 'btp pid', where pid is the numeric PID of the process.
> 
> Greg
> --
> See complete headers for address and phone numbers

Clearly memory has somehow been corrupted, but it's not apparent why. 
This machine doesn't have ECC memory, so it could be possible that it
was a random hardware glitch.

I guess I'll just have to see whether the crash repeats itself in
future.  Thank you to Greg and Morten Rodal for your replies.


-- 
David Siebörger
drs@rucus.ru.ac.za

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message