From owner-freebsd-current Wed Feb 23 7:53:47 2000 Delivered-To: freebsd-current@freebsd.org Received: from quack.kfu.com (quack.kfu.com [170.1.70.2]) by hub.freebsd.org (Postfix) with ESMTP id 5E04037B857 for ; Wed, 23 Feb 2000 07:53:43 -0800 (PST) (envelope-from nsayer@quack.kfu.com) Received: from icarus.kfu.com (icarus.kfu.com [170.1.70.17]) by quack.kfu.com (8.9.2/8.9.3) with ESMTP id HAA16102; Wed, 23 Feb 2000 07:53:37 -0800 (PST) (envelope-from nsayer@quack.kfu.com) Received: from quack.kfu.com by icarus.kfu.com with ESMTP (8.9.3//ident-1.0) id HAA03693; Wed, 23 Feb 2000 07:53:06 -0800 (PST) Message-ID: <38B40262.FC4EEDD0@quack.kfu.com> Date: Wed, 23 Feb 2000 15:53:06 +0000 From: Nick Sayer X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12 i386) X-Accept-Language: en MIME-Version: 1.0 To: David Gilbert , freebsd-current@freebsd.org Subject: Re: Wierd AMD panics caused by VMWare? References: <14515.59795.632514.748870@trooper.velocet.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG The only thing I would add is that by AMD I didn't mean Advanced Micro Devices. I meant /usr/sbin/amd. In my case this behavior has been observed on a Pentium III and on a K7, so it's CPU independent. David Gilbert wrote: > > I had reported this earlier, but the similarities are striking: > > I too have seen strange AMD panics where stack variables inexplicably > go to zero. My systems are K6/2-400's, and I have often witnessed the > following fault (only happens on a *really* busy web server) The common denominator seems to be that the machine has to be very active. VMware stresses the vm system quite a bit (64M of shared memory with multiple processes digging around, etc). A very busy web server is going to do a lot of context switching (I think?). In that situation, it appears that the stack is being smashed. I tried insulating the code where my machines go nuts inside of splhigh() / splx(), but it didn't help. Is your machine running the automounter? > > #0 boot (howto=256) at ../../kern/kern_shutdown.c:285 > #1 0xc014aad1 in panic (fmt=0xc023878a "page fault") > at ../../kern/kern_shutdown.c:446 > #2 0xc02098ce in trap_fatal (frame=0xcc74eecc, eva=134812896) > at ../../i386/i386/trap.c:942 > #3 0xc0209587 in trap_pfault (frame=0xcc74eecc, usermode=0, eva=134812896) > at ../../i386/i386/trap.c:835 > #4 0xc02091ba in trap (frame={tf_es = -887750640, tf_ds = -1036058608, > tf_edi = -1050208512, tf_esi = -1043943040, tf_ebp = -864751828, > tf_isp = -864751884, tf_ebx = 2287, tf_edx = -1036043576, tf_ecx = 0, > tf_eax = 134812884, tf_trapno = 12, tf_err = 2, tf_eip = -1072417321, > tf_cs = 8, tf_eflags = 66054, tf_esp = -1041509376, tf_ss = -1036024832}) > at ../../i386/i386/trap.c:437 > #5 0xc01435d7 in fdcopy (p=0xcc5796e0) at ../../kern/kern_descrip.c:954 > #6 0xc014587b in fork1 (p1=0xcc5796e0, flags=-2147483596) > at ../../kern/kern_fork.c:379 > #7 0xc014533b in vfork (p=0xcc5796e0, uap=0xcc74ef94) > at ../../kern/kern_fork.c:109 > #8 0xc0209b17 in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 236237520, > tf_esi = 236231856, tf_ebp = -1077952324, tf_isp = -864751644, > tf_ebx = 673171048, tf_edx = 163766316, tf_ecx = 672877149, tf_eax = 66, > tf_trapno = 7, tf_err = 2, tf_eip = 672936705, tf_cs = 31, > tf_eflags = 514, tf_esp = -1077952368, tf_ss = 39}) > at ../../i386/i386/trap.c:1100 > #9 0xc01feedc in Xint0x80_syscall () > > Now the interesting code here is at stack from #5: > > (kgdb) list > 948 fpp = newfdp->fd_ofiles; > 949 for (i = newfdp->fd_lastfile; i-- >= 0; fpp++) > 950 if (*fpp != NULL) > 951 (*fpp)->f_count++; > > (kgdb) p newfdp->fd_ofiles > $1 = (struct file **) 0xc23f2000 > (kgdb) p fpp > $2 = (struct file **) 0x0 > > Now... the only operation on fpp is fpp++. It should take a _long_ > time for fpp to get around to 0 and you'd thing that *fpp would be > zero long before that (or cause a page fault at some other > non-existant location). > > So... the similarity here is that deep in the kernel, we have a > automatic (possibly register) local variable that's getting zero'd. > > I have half-a-dozen crash dumps of this nature. For me, it always > happens in fdcopy(). This may be due to the fact that the machine is > running a large apache config --- so fork() is something it's doing > often. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message