From owner-freebsd-smp  Thu Dec  5 19:07:08 1996
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id TAA22837
          for smp-outgoing; Thu, 5 Dec 1996 19:07:08 -0800 (PST)
Received: from clem.systemsix.com (clem.systemsix.com [198.99.86.131])
          by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id TAA22803
          for <smp@FreeBSD.ORG>; Thu, 5 Dec 1996 19:07:03 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by clem.systemsix.com (8.6.12/8.6.12) with SMTP id UAA16913; Thu, 5 Dec 1996 20:06:51 -0700
Message-Id: <199612060306.UAA16913@clem.systemsix.com>
X-Authentication-Warning: clem.systemsix.com: Host localhost didn't use HELO protocol
X-Mailer: exmh version 1.6.5 12/11/95
From: Steve Passe <smp@csn.net>
To: smp@FreeBSD.ORG
cc: Tor.Egge@idt.ntnu.no (Tor Egge), "J.M. Chuang" <smp@bluenose.na.tuns.ca>,
        Janick.Taillandier@ratp.fr (Janick TAILLANDIER),
        Peter Wemm <peter@spinner.dialix.com>
Subject: last major problem
In-reply-to: Your message of "Thu, 05 Dec 1996 21:52:00 -0400."
             <199612060152.VAA23427@bluenose.na.tuns.ca> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 05 Dec 1996 20:06:51 -0700
Sender: owner-smp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Hi,

 It appears we have one last serious problem to fix.  3 users report
crashing shortly after starting the 2nd CPU.

--
>I tried a few hours old kernel on an ASUS P/I-P65UP5, with APIC_IO
>enabled. When compiling a new kernel with two active CPUs, I got 
>error messages from gcc, and the compile failed. Restarting the
>kernel compiling caused a trap 12, and a kernel dump.
>
>When looking at the kernel dump, I get
>
>#0  boot (howto=256) at ../../kern/kern_shutdown.c:267
>#1  0xe0112d29 in panic (fmt=0xe01bcbcf "page fault")
>    at ../../kern/kern_shutdown.c:395
>#2  0xe01bd8b5 in trap_fatal (frame=0xdfbffe58) at ../../i386/i386/trap.c:747
>#3  0xe01bd2e8 in trap_pfault (frame=0xdfbffe58, usermode=0)
>    at ../../i386/i386/trap.c:654
>#4  0xe01bcf1b in trap (frame={tf_es = -270335984, tf_ds = 16, 
>      tf_edi = -270395292, tf_esi = -541077504, tf_ebp = -541065552, 
>      tf_isp = -541065600, tf_ebx = 296243200, tf_edx = -4194304, 
>      tf_ecx = -528396, tf_eax = -528396, tf_trapno = 12, tf_err = 0, 
>      tf_eip = -535058833, tf_cs = 8, tf_eflags = 66178, tf_esp = -530083069, 
>      tf_ss = -270296448}) at ../../i386/i386/trap.c:313
>#5  0xe01ba66f in pmap_enter (pmap=0xefe21864, va=3753889792, pa=296243200, 
>    prot=7 '\a', wired=0) at ../../i386/i386/pmap.c:2014
>#6  0xe01a4153 in vm_fault (map=0xefe21800, vaddr=3753889792, 
>    fault_type=3 '\003', change_wiring=0) at ../../vm/vm_fault.c:773
>#7  0xe01bd240 in trap_pfault (frame=0xdfbfffbc, usermode=1)
>    at ../../i386/i386/trap.c:634
>#8  0xe01bcdc3 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = 352256, 
>      tf_esi = 330220, tf_ebp = -541074428, tf_isp = -541065244, tf_ebx = 0, 
>      tf_edx = 1, tf_ecx = 330220, tf_eax = 0, tf_trapno = 12, tf_err = 7, 
>      tf_eip = 45296, tf_cs = 31, tf_eflags = 66050, tf_esp = -541074452, 
>      tf_ss = 39}) at ../../i386/i386/trap.c:241
>(kgdb) up 5
>#5  0xe01ba66f in pmap_enter (pmap=0xefe21864, va=3753889792, pa=296243200, 
>    prot=7 '\a', wired=0) at ../../i386/i386/pmap.c:2014
>2014            origpte = *(vm_offset_t *)pte;
>(kgdb) print/x pte
>$2 = 0xfff7eff4
>
>This indicates an attempt to dereference address 0xfff7eff4, which seems
>bogus :-(
>
>- Tor Egge

---
> I turn on DDB, here is the report from DDB when system reboots shortly after
> the second CPU is launched:
> 
> cpunumber =0
> fault virtual address = 0xfffbeff4
> fault code            = supervisor read, page not present
> instruction pointer   = 0x8:0xf01be11b
> stack  pointer	      = 0x8:0xefbffe94
> frame pointer	      = 0x10:0xefbffeb0
> code segment	      = base 0x0, limit 0xffff, type0x1b
>  		      = DPL 0, pres 1, def32 1, gran 1
> processor eflags      = interrupt enabled, resume, IOPL=0
> current process       = 270 (sh)
> interrupt mask        =
> kernel: type 12 trap, code=0
> stopped at _pmap_enter+0x8f:   movl   0 (%ecx), %ecx
> db> trace
> _pmap_enter (f0cbc464,efbfd000,da5000,7,0) at _pmap_enter+0x8ff
> vm_fault(f0cbc400,efbfd000,3,0,0) at _vm_fault+0xd0b
> trap_pfault(efbfffbc,1) at _trap_pfault+0xd4
> _trap(27,27,56000,509ec,efbfdb34) at _trap+0x146
> calltrap() at calltrap+0x1a
> - trap 12, eip=0xb0f0, ebp=0xefbfdb34 ---
> - curporc=0xf0be3c00, pie=270 ---
> 
> 
> Jim

---
>I have installed the latest patches and I am still having trap 12
>a few seconds after starting the second CPU. Everything seems ok 
>with only one CPU.
>The machine is a Dell Optiplex GXpro with 2 P6 200 MHz, 32MB,
>SCSI disk on Adaptec 2940UW.
> ...
>Stopped at _pmap_enter+0x8f:          movl   0(%ecx),%ecx
>
>trace gives:
>
>_pmap_enter(f13ad064,efbfd000,17c8000,7,0) at  _pmap_enter+0x8f
>_vm_fault(f13ad000,efbfd000,3,0,0) at _vm_fault+xxd`b
>_trap_pfault(efbfffbc,1) at _trap_pfault+0xd4
>_trap(27,27,50274,1,efbfd5F4) at _trap+0x14b
>calltrap() at calltrap+0x1a
>--- trap 12, eip = 0xe9fc, ebp =0xefbfd5f4 ---
>--- curproc = 0xf13e9a00, pid = 1561 ---
>
>I have similar crash without APIC_IO or SMP_INVLTLB.
>
>Janick

---

Tor's machine:
 P6, Asus P/I-P65UP5/C-P6ND, entry 5 in mptable database

Jims's machine:
 P6, Titan Pro, entry 17 in mptable database (someone else, same MB)
 (I'm assumming its the Titan Pro, Tomcat II/EIDE previously reported working)

Janick's machine:
 P6, Dell Optiplex GXPro, entry 6 in mptable database.

They're all P6, ...

any theories/clues???

--
Steve Passe	| powered by
smp@csn.net	|            FreeBSD