From owner-freebsd-smp  Thu Sep  4 13:21:26 1997
Return-Path: <owner-freebsd-smp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id NAA21040
          for smp-outgoing; Thu, 4 Sep 1997 13:21:26 -0700 (PDT)
Received: from Ilsa.StevesCafe.com (Ilsa.StevesCafe.com [205.168.119.129])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id NAA21028
          for <smp@FreeBSD.ORG>; Thu, 4 Sep 1997 13:21:19 -0700 (PDT)
Received: from Ilsa.StevesCafe.com (localhost [127.0.0.1])
	by Ilsa.StevesCafe.com (8.8.7/8.8.5) with ESMTP id OAA10238;
	Thu, 4 Sep 1997 14:21:05 -0600 (MDT)
Message-Id: <199709042021.OAA10238@Ilsa.StevesCafe.com>
X-Mailer: exmh version 2.0gamma 1/27/96
From: Steve Passe <smp@csn.net>
To: "John S. Dyson" <toor@dyson.iquest.net>
cc: smp@FreeBSD.ORG
Subject: Re: 3.0/SMP panic 
In-reply-to: Your message of "Thu, 04 Sep 1997 12:31:52 CDT."
             <199709041731.MAA01880@dyson.iquest.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 04 Sep 1997 14:21:05 -0600
Sender: owner-freebsd-smp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Hi,

We have gotten most SMP systems running now, one recent hurdle was lkms
that got out of sync with the kernel proper.  The symptom was panic during
boot, or possibly when a screensaver lkm activated.  ipfw_mod was also shown
to be a problem.
The solution is to sup current source for the lkms, rebuild & install them.

We still have at least one fundimental bug affecting a small number of
systems: Fatal trap 12 during boot with -current.

This bug has so far only been seen under SMP (is this true?).  It appears to
be very dependant on the specific system configuration.  The following is
a roundup of reports from various users.  Unless your working on this
problem you probably don't want to read further.

---
Kenneth Merry <ken@plutotech.com>:

>	By any chance do you have more than 64MB in your machine and
>options MAXMEM=... in your kernel config file?  
>
>	I did, and I had panics very much like that (in pmap_enter)
>immediately on boot.  When I took the MAXMEM line out (I've got 128MB),
>things worked just fine...  I'm still not sure why, though.
> ...
> 	I found the problem.  At first I suspected the sound driver, but
> the problem really turned out to be:
> 
> options        "MAXMEM=(128*1024)"


---
Jaye Mathisen <mrcpu@schizo.cdsnet.net>:

I was using M$ Inetload 2.0 to simulate a bunch of mail users.   IT was 
running fine for a few minutes, then died horribly with:

Fatal trap 12: page fault while in kernel mode
cpuid = 1
lapic.id = 33554432

current process = Idle
mp_lock = 01000003

interrupt mask = net tty bio <- SMP: XXX

Stopped at _pmap_enter+0xa7:


and some other stuff.

The traceback is not too long, but I don't have any good way to type it
all in.

It goes like:

_pmap_enter
_vm_fault
Trap_pfault
_trap
_zalloc
_pmap_insert_entry
_pmap_enter
_kmem_alloc
_in_pcballoc
_Tcp_attach
_tcp_usr_attack
_sonewconn
_tcp_inut
_ip_input
_ipintr
swi_net_next


It is trivially reproducible at least on my hardware.


---
Akira Watanabe <akira@myaw.ei.meisei-u.ac.jp>:

The kernel (suped yesterday) causes a panic.

Fatal trap 18: integer divide fault while in kernel mode
cpuid = 0
lapic.id = 16777216
instruction pointer     = 0x8:0xf01bc794
stack pointer           = 0x10:0xf4cabc84
frame pointer           = 0x10:0xf4cabcd0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 236 (ftpd)
mp_lock                 = 00000003
interrupt mask          =  <- SMP: XXX
trap number             = 18
panic: integer divide fault
 cpuid 0
boot() called on cpu#0

syncing disks... 11 11 8 2 done

Here is a stack trace.

# gdb -k kernel /var/crash/vmcore.0
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd), 
Copyright 1996 Free Software Foundation, Inc...
IdlePTD 24d000
current pcb at 1f9608
panic: integer divide fault
#0  boot (howto=256) at ../../kern/kern_shutdown.c:289
289                                     dumppcb.pcb_cr3 = rcr3();
(kgdb) where
#0  boot (howto=256) at ../../kern/kern_shutdown.c:289
#1  0xf0118e36 in panic (fmt=0xf01ccbea "integer divide fault")
    at ../../kern/kern_shutdown.c:416
#2  0xf01cd86f in trap_fatal (frame=0xf4cabc48) at ../../i386/i386/trap.c:806
#3  0xf01cd072 in trap (frame={tf_es = -256049136, tf_ds = 131088, 
      tf_edi = -256677376, tf_esi = 0, tf_ebp = -188039984, 
      tf_isp = -188040080, tf_ebx = 0, tf_edx = 0, tf_ecx = 4096, 
      tf_eax = 4096, tf_trapno = 18, tf_err = 0, tf_eip = -266614892, 
      tf_cs = 8, tf_eflags = 66118, tf_esp = 0, tf_ss = 3})
    at ../../i386/i386/trap.c:487
#4  0xf01bc794 in vnode_pager_haspage (object=0xf0bdb800, pindex=0, 
    before=0xf4cabd34, after=0xf4cabd30) at ../../vm/vnode_pager.c:231
#5  0xf01bbcff in vm_pager_has_page (object=0xf0bdb800, offset=0, 
    before=0xf4cabd34, after=0xf4cabd30) at ../../vm/vm_pager.c:205
#6  0xf01b2e05 in vm_fault_additional_pages (m=0xf04ef7c4, rbehind=3, 
    rahead=4, marray=0xf4cabdd0, reqpage=0xf4cabda4)
    at ../../vm/vm_fault.c:1100
#7  0xf01b21c0 in vm_fault (map=0xf0bd9300, vaddr=134385664, 
    fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:414
#8  0xf01cd23a in trap_pfault (frame=0xf4cabe50, usermode=0)
    at ../../i386/i386/trap.c:681
#9  0xf01ccf47 in trap (frame={tf_es = 134348816, tf_ds = 134348816, 
      tf_edi = -259133440, tf_esi = 134385664, tf_ebp = -188039496, 
      tf_isp = -188039560, tf_ebx = 2048, tf_edx = 134387712, tf_ecx = 512, 
      tf_eax = -188047360, tf_trapno = 12, tf_err = 0, tf_eip = -266551371, 
      tf_cs = 8, tf_eflags = 66054, tf_esp = -188039368, tf_ss = -188039376})
    at ../../i386/i386/trap.c:339
#10 0xf01cbfb5 in generic_copyin ()
#11 0xf012cf6f in sosend (so=0xf0bddc00, addr=0x0, uio=0xf4cabf38, top=0x0, 
    control=0x0, flags=0, p=0xf0bb9600) at ../../kern/uipc_socket.c:449
#12 0xf0122ec8 in soo_write (fp=0xf0bdcd40, uio=0xf4cabf38, cred=0xf0bd8b00)
    at ../../kern/sys_socket.c:78
#13 0xf0120884 in write (p=0xf0bb9600, uap=0xf4cabf94, retval=0xf4cabf84)
    at ../../kern/sys_generic.c:268
#14 0xf01cdacb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134385664, 
      tf_esi = 2256, tf_ebp = -272642012, tf_isp = -188039196, tf_ebx = 6, 
      tf_edx = 5, tf_ecx = 1, tf_eax = 4, tf_trapno = 22, tf_err = 7, 
      tf_eip = 135028641, tf_cs = 31, tf_eflags = 531, tf_esp = -272642064, 
      tf_ss = 39}) at ../../i386/i386/trap.c:953
#15 0x80c5fa1 in ?? ()
#16 0x3658 in ?? ()
#17 0x86fb in ?? ()
#18 0x2045 in ?? ()
#19 0x1096 in ?? ()
(kgdb) 


---
Hajimu UMEMOTO <ume@calm.imasy.or.jp>:

Sept 1:
> Yes, ipfw_mod and linux_mod were loaded.  According to your
> suggestion, I disabled loading ipfw_mod and reboot.  Then, the kernel
> was boot without any problem. :-)

Sept 4:
> I built lkms during `make world'.  I wish to try that method, but...
> I've tried with the kernel cvsuped at Sep 3 and Sep 4.  Although no
> lkm module is loaded, when accessing network, the kernel causes panic
> frequently.  The UP kernel seems to have no problem.  I'm using vx
> driver for 3C905.


---
Tom Bartol <bartol@salk.edu>:

Over the last several days starting from world/kernel of 8/28 and even on
world/kernel as of last night (9/2) I get crashes that haven't left me
with any useful info.  All the crashes have occured while composing e-mail
from within pine.  My /var/mail is an NFSv3 mounted fs served by an Auspex
NS-7000 over 100/BT (nice!).  The system in question is a Dell XPS-P133c
(i.e. P5/133)  with 128MB, Adaptec 2940U, and 3Com 3C595 100/BT.  I've
been running the same world/kernel on my home system with no trouble (but
no NFS or network card either).  Curiously, I composed this e-mail on the
unstable system with no trouble.  All the crashes consistently occured
while composing mail within a few minutes after logging in. 


---
From: randyd <randyd@nconnect.net>
To: smp@csn.net
Subject: SMP / LKM update

Greetings,

Just a quick update...

I cvsupped new code at about 7:30 CST yesterday and did "make
cleandepend && make
world".  This AM I built a fresh SMP kernel and rebooted the machine.  I
didn't 
start an X session though, I waited for the 'daemon' screen saver to
"kick in".
When it did, I got a screen full of...
Oops I'm on cpu#1, I need to be on Cpu#0


--
Steve Passe	| powered by 
smp@csn.net	|            Symmetric MultiProcessor FreeBSD