From owner-freebsd-current  Wed Feb 23  7:53:47 2000
Delivered-To: freebsd-current@freebsd.org
Received: from quack.kfu.com (quack.kfu.com [170.1.70.2])
	by hub.freebsd.org (Postfix) with ESMTP id 5E04037B857
	for <freebsd-current@freebsd.org>; Wed, 23 Feb 2000 07:53:43 -0800 (PST)
	(envelope-from nsayer@quack.kfu.com)
Received: from icarus.kfu.com (icarus.kfu.com [170.1.70.17])
	by quack.kfu.com (8.9.2/8.9.3) with ESMTP id HAA16102;
	Wed, 23 Feb 2000 07:53:37 -0800 (PST)
	(envelope-from nsayer@quack.kfu.com)
Received: from quack.kfu.com by icarus.kfu.com  with ESMTP
        (8.9.3//ident-1.0) id HAA03693; Wed, 23 Feb 2000 07:53:06 -0800 (PST) 
Message-ID: <38B40262.FC4EEDD0@quack.kfu.com>
Date: Wed, 23 Feb 2000 15:53:06 +0000
From: Nick Sayer <nsayer@quack.kfu.com>
X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: David Gilbert <dgilbert@velocet.ca>, freebsd-current@freebsd.org
Subject: Re: Wierd AMD panics caused by VMWare?
References: <14515.59795.632514.748870@trooper.velocet.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

The only thing I would add is that by AMD I didn't mean
Advanced Micro Devices. I meant /usr/sbin/amd. In my case
this behavior has been observed on a Pentium III and on a
K7, so it's CPU independent.

David Gilbert wrote:
> 
> I had reported this earlier, but the similarities are striking:
> 
> I too have seen strange AMD panics where stack variables inexplicably
> go to zero.  My systems are K6/2-400's, and I have often witnessed the
> following fault (only happens on a *really* busy web server)

The common denominator seems to be that the machine has to be very
active. VMware stresses the vm system quite a bit (64M of shared
memory with multiple processes digging around, etc). A very busy
web server is going to do a lot of context switching (I think?).
In that situation, it appears that the stack is being smashed.

I tried insulating the code where my machines go nuts inside of
splhigh() / splx(), but it didn't help.

Is your machine running the automounter?

> 
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:285
> #1  0xc014aad1 in panic (fmt=0xc023878a "page fault")
>     at ../../kern/kern_shutdown.c:446
> #2  0xc02098ce in trap_fatal (frame=0xcc74eecc, eva=134812896)
>     at ../../i386/i386/trap.c:942
> #3  0xc0209587 in trap_pfault (frame=0xcc74eecc, usermode=0, eva=134812896)
>     at ../../i386/i386/trap.c:835
> #4  0xc02091ba in trap (frame={tf_es = -887750640, tf_ds = -1036058608,
>       tf_edi = -1050208512, tf_esi = -1043943040, tf_ebp = -864751828,
>       tf_isp = -864751884, tf_ebx = 2287, tf_edx = -1036043576, tf_ecx = 0,
>       tf_eax = 134812884, tf_trapno = 12, tf_err = 2, tf_eip = -1072417321,
>       tf_cs = 8, tf_eflags = 66054, tf_esp = -1041509376, tf_ss = -1036024832})
>     at ../../i386/i386/trap.c:437
> #5  0xc01435d7 in fdcopy (p=0xcc5796e0) at ../../kern/kern_descrip.c:954
> #6  0xc014587b in fork1 (p1=0xcc5796e0, flags=-2147483596)
>     at ../../kern/kern_fork.c:379
> #7  0xc014533b in vfork (p=0xcc5796e0, uap=0xcc74ef94)
>     at ../../kern/kern_fork.c:109
> #8  0xc0209b17 in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 236237520,
>       tf_esi = 236231856, tf_ebp = -1077952324, tf_isp = -864751644,
>       tf_ebx = 673171048, tf_edx = 163766316, tf_ecx = 672877149, tf_eax = 66,
>       tf_trapno = 7, tf_err = 2, tf_eip = 672936705, tf_cs = 31,
>       tf_eflags = 514, tf_esp = -1077952368, tf_ss = 39})
>     at ../../i386/i386/trap.c:1100
> #9  0xc01feedc in Xint0x80_syscall ()
> 
> Now the interesting code here is at stack from #5:
> 
> (kgdb) list
> 948             fpp = newfdp->fd_ofiles;
> 949             for (i = newfdp->fd_lastfile; i-- >= 0; fpp++)
> 950                     if (*fpp != NULL)
> 951                             (*fpp)->f_count++;
> 
> (kgdb) p newfdp->fd_ofiles
> $1 = (struct file **) 0xc23f2000
> (kgdb) p fpp
> $2 = (struct file **) 0x0
> 
> Now... the only operation on fpp is fpp++.  It should take a _long_
> time for fpp to get around to 0 and you'd thing that *fpp would be
> zero long before that (or cause a page fault at some other
> non-existant location).
> 
> So... the similarity here is that deep in the kernel, we have a
> automatic (possibly register) local variable that's getting zero'd.
> 
> I have half-a-dozen crash dumps of this nature.  For me, it always
> happens in fdcopy().  This may be due to the fact that the machine is
> running a large apache config --- so fork() is something it's doing
> often.
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-current" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message