From owner-freebsd-smp  Thu Sep  4 16:18:23 1997
Return-Path: <owner-freebsd-smp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id QAA29102
          for smp-outgoing; Thu, 4 Sep 1997 16:18:23 -0700 (PDT)
Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id QAA29097
          for <smp@FreeBSD.ORG>; Thu, 4 Sep 1997 16:18:13 -0700 (PDT)
Received: from mail.cdsnet.net (mail.cdsnet.net [204.118.244.5])
	by mail.cdsnet.net (8.8.6/8.8.6) with SMTP id QAA09584;
	Thu, 4 Sep 1997 16:17:54 -0700 (PDT)
Date: Thu, 4 Sep 1997 16:17:53 -0700 (PDT)
From: Jaye Mathisen  <mrcpu@cdsnet.net>
To: Steve Passe <smp@csn.net>
cc: "John S. Dyson" <toor@dyson.iquest.net>, smp@FreeBSD.ORG
Subject: Re: 3.0/SMP panic 
In-Reply-To: <199709042021.OAA10238@Ilsa.StevesCafe.com>
Message-ID: <Pine.NEB.3.95.970904161708.7803Q-100000@mail.cdsnet.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-smp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


Recent kernel changes (maybe the tcp stuff garrett did relating to
software interrupts?  (I have a DPT controller), seems to have fixed it,
The same load that was crashing it yesterday after supping and hbuilding
and booting new kernel is running fine.

On Thu, 4 Sep 1997, Steve Passe wrote:

> Hi,
> 
> We have gotten most SMP systems running now, one recent hurdle was lkms
> that got out of sync with the kernel proper.  The symptom was panic during
> boot, or possibly when a screensaver lkm activated.  ipfw_mod was also shown
> to be a problem.
> The solution is to sup current source for the lkms, rebuild & install them.
> 
> We still have at least one fundimental bug affecting a small number of
> systems: Fatal trap 12 during boot with -current.
> 
> This bug has so far only been seen under SMP (is this true?).  It appears to
> be very dependant on the specific system configuration.  The following is
> a roundup of reports from various users.  Unless your working on this
> problem you probably don't want to read further.
> 
> ---
> Kenneth Merry <ken@plutotech.com>:
> 
> >	By any chance do you have more than 64MB in your machine and
> >options MAXMEM=... in your kernel config file?  
> >
> >	I did, and I had panics very much like that (in pmap_enter)
> >immediately on boot.  When I took the MAXMEM line out (I've got 128MB),
> >things worked just fine...  I'm still not sure why, though.
> > ...
> > 	I found the problem.  At first I suspected the sound driver, but
> > the problem really turned out to be:
> > 
> > options        "MAXMEM=(128*1024)"
> 
> 
> ---
> Jaye Mathisen <mrcpu@schizo.cdsnet.net>:
> 
> I was using M$ Inetload 2.0 to simulate a bunch of mail users.   IT was 
> running fine for a few minutes, then died horribly with:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1
> lapic.id = 33554432
> 
> current process = Idle
> mp_lock = 01000003
> 
> interrupt mask = net tty bio <- SMP: XXX
> 
> Stopped at _pmap_enter+0xa7:
> 
> 
> and some other stuff.
> 
> The traceback is not too long, but I don't have any good way to type it
> all in.
> 
> It goes like:
> 
> _pmap_enter
> _vm_fault
> Trap_pfault
> _trap
> _zalloc
> _pmap_insert_entry
> _pmap_enter
> _kmem_alloc
> _in_pcballoc
> _Tcp_attach
> _tcp_usr_attack
> _sonewconn
> _tcp_inut
> _ip_input
> _ipintr
> swi_net_next
> 
> 
> It is trivially reproducible at least on my hardware.
> 
> 
> ---
> Akira Watanabe <akira@myaw.ei.meisei-u.ac.jp>:
> 
> The kernel (suped yesterday) causes a panic.
> 
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 0
> lapic.id = 16777216
> instruction pointer     = 0x8:0xf01bc794
> stack pointer           = 0x10:0xf4cabc84
> frame pointer           = 0x10:0xf4cabcd0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 236 (ftpd)
> mp_lock                 = 00000003
> interrupt mask          =  <- SMP: XXX
> trap number             = 18
> panic: integer divide fault
>  cpuid 0
> boot() called on cpu#0
> 
> syncing disks... 11 11 8 2 done
> 
> Here is a stack trace.
> 
> # gdb -k kernel /var/crash/vmcore.0
> GDB is free software and you are welcome to distribute copies of it
>  under certain conditions; type "show copying" to see the conditions.
> There is absolutely no warranty for GDB; type "show warranty" for details.
> GDB 4.16 (i386-unknown-freebsd), 
> Copyright 1996 Free Software Foundation, Inc...
> IdlePTD 24d000
> current pcb at 1f9608
> panic: integer divide fault
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:289
> 289                                     dumppcb.pcb_cr3 = rcr3();
> (kgdb) where
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:289
> #1  0xf0118e36 in panic (fmt=0xf01ccbea "integer divide fault")
>     at ../../kern/kern_shutdown.c:416
> #2  0xf01cd86f in trap_fatal (frame=0xf4cabc48) at ../../i386/i386/trap.c:806
> #3  0xf01cd072 in trap (frame={tf_es = -256049136, tf_ds = 131088, 
>       tf_edi = -256677376, tf_esi = 0, tf_ebp = -188039984, 
>       tf_isp = -188040080, tf_ebx = 0, tf_edx = 0, tf_ecx = 4096, 
>       tf_eax = 4096, tf_trapno = 18, tf_err = 0, tf_eip = -266614892, 
>       tf_cs = 8, tf_eflags = 66118, tf_esp = 0, tf_ss = 3})
>     at ../../i386/i386/trap.c:487
> #4  0xf01bc794 in vnode_pager_haspage (object=0xf0bdb800, pindex=0, 
>     before=0xf4cabd34, after=0xf4cabd30) at ../../vm/vnode_pager.c:231
> #5  0xf01bbcff in vm_pager_has_page (object=0xf0bdb800, offset=0, 
>     before=0xf4cabd34, after=0xf4cabd30) at ../../vm/vm_pager.c:205
> #6  0xf01b2e05 in vm_fault_additional_pages (m=0xf04ef7c4, rbehind=3, 
>     rahead=4, marray=0xf4cabdd0, reqpage=0xf4cabda4)
>     at ../../vm/vm_fault.c:1100
> #7  0xf01b21c0 in vm_fault (map=0xf0bd9300, vaddr=134385664, 
>     fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:414
> #8  0xf01cd23a in trap_pfault (frame=0xf4cabe50, usermode=0)
>     at ../../i386/i386/trap.c:681
> #9  0xf01ccf47 in trap (frame={tf_es = 134348816, tf_ds = 134348816, 
>       tf_edi = -259133440, tf_esi = 134385664, tf_ebp = -188039496, 
>       tf_isp = -188039560, tf_ebx = 2048, tf_edx = 134387712, tf_ecx = 512, 
>       tf_eax = -188047360, tf_trapno = 12, tf_err = 0, tf_eip = -266551371, 
>       tf_cs = 8, tf_eflags = 66054, tf_esp = -188039368, tf_ss = -188039376})
>     at ../../i386/i386/trap.c:339
> #10 0xf01cbfb5 in generic_copyin ()
> #11 0xf012cf6f in sosend (so=0xf0bddc00, addr=0x0, uio=0xf4cabf38, top=0x0, 
>     control=0x0, flags=0, p=0xf0bb9600) at ../../kern/uipc_socket.c:449
> #12 0xf0122ec8 in soo_write (fp=0xf0bdcd40, uio=0xf4cabf38, cred=0xf0bd8b00)
>     at ../../kern/sys_socket.c:78
> #13 0xf0120884 in write (p=0xf0bb9600, uap=0xf4cabf94, retval=0xf4cabf84)
>     at ../../kern/sys_generic.c:268
> #14 0xf01cdacb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134385664, 
>       tf_esi = 2256, tf_ebp = -272642012, tf_isp = -188039196, tf_ebx = 6, 
>       tf_edx = 5, tf_ecx = 1, tf_eax = 4, tf_trapno = 22, tf_err = 7, 
>       tf_eip = 135028641, tf_cs = 31, tf_eflags = 531, tf_esp = -272642064, 
>       tf_ss = 39}) at ../../i386/i386/trap.c:953
> #15 0x80c5fa1 in ?? ()
> #16 0x3658 in ?? ()
> #17 0x86fb in ?? ()
> #18 0x2045 in ?? ()
> #19 0x1096 in ?? ()
> (kgdb) 
> 
> 
> ---
> Hajimu UMEMOTO <ume@calm.imasy.or.jp>:
> 
> Sept 1:
> > Yes, ipfw_mod and linux_mod were loaded.  According to your
> > suggestion, I disabled loading ipfw_mod and reboot.  Then, the kernel
> > was boot without any problem. :-)
> 
> Sept 4:
> > I built lkms during `make world'.  I wish to try that method, but...
> > I've tried with the kernel cvsuped at Sep 3 and Sep 4.  Although no
> > lkm module is loaded, when accessing network, the kernel causes panic
> > frequently.  The UP kernel seems to have no problem.  I'm using vx
> > driver for 3C905.
> 
> 
> ---
> Tom Bartol <bartol@salk.edu>:
> 
> Over the last several days starting from world/kernel of 8/28 and even on
> world/kernel as of last night (9/2) I get crashes that haven't left me
> with any useful info.  All the crashes have occured while composing e-mail
> from within pine.  My /var/mail is an NFSv3 mounted fs served by an Auspex
> NS-7000 over 100/BT (nice!).  The system in question is a Dell XPS-P133c
> (i.e. P5/133)  with 128MB, Adaptec 2940U, and 3Com 3C595 100/BT.  I've
> been running the same world/kernel on my home system with no trouble (but
> no NFS or network card either).  Curiously, I composed this e-mail on the
> unstable system with no trouble.  All the crashes consistently occured
> while composing mail within a few minutes after logging in. 
> 
> 
> ---
> From: randyd <randyd@nconnect.net>
> To: smp@csn.net
> Subject: SMP / LKM update
> 
> Greetings,
> 
> Just a quick update...
> 
> I cvsupped new code at about 7:30 CST yesterday and did "make
> cleandepend && make
> world".  This AM I built a fresh SMP kernel and rebooted the machine.  I
> didn't 
> start an X session though, I waited for the 'daemon' screen saver to
> "kick in".
> When it did, I got a screen full of...
> Oops I'm on cpu#1, I need to be on Cpu#0
> 
> 
> --
> Steve Passe	| powered by 
> smp@csn.net	|            Symmetric MultiProcessor FreeBSD
> 
>