Date: Wed, 23 Feb 2000 15:53:06 +0000 From: Nick Sayer <nsayer@quack.kfu.com> To: David Gilbert <dgilbert@velocet.ca>, freebsd-current@freebsd.org Subject: Re: Wierd AMD panics caused by VMWare? Message-ID: <38B40262.FC4EEDD0@quack.kfu.com> References: <14515.59795.632514.748870@trooper.velocet.net>
next in thread | previous in thread | raw e-mail | index | archive | help
The only thing I would add is that by AMD I didn't mean
Advanced Micro Devices. I meant /usr/sbin/amd. In my case
this behavior has been observed on a Pentium III and on a
K7, so it's CPU independent.
David Gilbert wrote:
>
> I had reported this earlier, but the similarities are striking:
>
> I too have seen strange AMD panics where stack variables inexplicably
> go to zero. My systems are K6/2-400's, and I have often witnessed the
> following fault (only happens on a *really* busy web server)
The common denominator seems to be that the machine has to be very
active. VMware stresses the vm system quite a bit (64M of shared
memory with multiple processes digging around, etc). A very busy
web server is going to do a lot of context switching (I think?).
In that situation, it appears that the stack is being smashed.
I tried insulating the code where my machines go nuts inside of
splhigh() / splx(), but it didn't help.
Is your machine running the automounter?
>
> #0 boot (howto=256) at ../../kern/kern_shutdown.c:285
> #1 0xc014aad1 in panic (fmt=0xc023878a "page fault")
> at ../../kern/kern_shutdown.c:446
> #2 0xc02098ce in trap_fatal (frame=0xcc74eecc, eva=134812896)
> at ../../i386/i386/trap.c:942
> #3 0xc0209587 in trap_pfault (frame=0xcc74eecc, usermode=0, eva=134812896)
> at ../../i386/i386/trap.c:835
> #4 0xc02091ba in trap (frame={tf_es = -887750640, tf_ds = -1036058608,
> tf_edi = -1050208512, tf_esi = -1043943040, tf_ebp = -864751828,
> tf_isp = -864751884, tf_ebx = 2287, tf_edx = -1036043576, tf_ecx = 0,
> tf_eax = 134812884, tf_trapno = 12, tf_err = 2, tf_eip = -1072417321,
> tf_cs = 8, tf_eflags = 66054, tf_esp = -1041509376, tf_ss = -1036024832})
> at ../../i386/i386/trap.c:437
> #5 0xc01435d7 in fdcopy (p=0xcc5796e0) at ../../kern/kern_descrip.c:954
> #6 0xc014587b in fork1 (p1=0xcc5796e0, flags=-2147483596)
> at ../../kern/kern_fork.c:379
> #7 0xc014533b in vfork (p=0xcc5796e0, uap=0xcc74ef94)
> at ../../kern/kern_fork.c:109
> #8 0xc0209b17 in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 236237520,
> tf_esi = 236231856, tf_ebp = -1077952324, tf_isp = -864751644,
> tf_ebx = 673171048, tf_edx = 163766316, tf_ecx = 672877149, tf_eax = 66,
> tf_trapno = 7, tf_err = 2, tf_eip = 672936705, tf_cs = 31,
> tf_eflags = 514, tf_esp = -1077952368, tf_ss = 39})
> at ../../i386/i386/trap.c:1100
> #9 0xc01feedc in Xint0x80_syscall ()
>
> Now the interesting code here is at stack from #5:
>
> (kgdb) list
> 948 fpp = newfdp->fd_ofiles;
> 949 for (i = newfdp->fd_lastfile; i-- >= 0; fpp++)
> 950 if (*fpp != NULL)
> 951 (*fpp)->f_count++;
>
> (kgdb) p newfdp->fd_ofiles
> $1 = (struct file **) 0xc23f2000
> (kgdb) p fpp
> $2 = (struct file **) 0x0
>
> Now... the only operation on fpp is fpp++. It should take a _long_
> time for fpp to get around to 0 and you'd thing that *fpp would be
> zero long before that (or cause a page fault at some other
> non-existant location).
>
> So... the similarity here is that deep in the kernel, we have a
> automatic (possibly register) local variable that's getting zero'd.
>
> I have half-a-dozen crash dumps of this nature. For me, it always
> happens in fdcopy(). This may be due to the fact that the machine is
> running a large apache config --- so fork() is something it's doing
> often.
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-current" in the body of the message
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38B40262.FC4EEDD0>
