From owner-freebsd-smp  Sun Dec 22 09:36:48 1996
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id JAA13569
          for smp-outgoing; Sun, 22 Dec 1996 09:36:48 -0800 (PST)
Received: from uruk.org (root@faustus.dev.com [198.145.95.253])
          by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA13557
          for <smp@freebsd.org>; Sun, 22 Dec 1996 09:36:43 -0800 (PST)
Received: from uruk.org [127.0.0.1] (erich)
	by uruk.org with esmtp (Exim 0.53 #1)
	id E0vbsml-00080n-00; Sun, 22 Dec 1996 10:38:07 -0800
To: smp@freebsd.org
Subject: (long) P6 and ??? TLB shootdown ???
Date: Sun, 22 Dec 1996 10:38:07 -0800
From: Erich Boleyn <erich@uruk.org>
Message-Id: <E0vbsml-00080n-00@uruk.org>
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


Hi all.  I spent the last few days doing debugging exercises with
FreeBSD-SMP on my P6 SMP test box.  The results were interesting.

First of all, I dug around in the debugger more, always getting an
error message and stack traceback that always looks like the following
(modulo some differences in the "fault virtual address" and the
"current process" stuff):

---------------------------(start DDB stuff)--------------------------------
Fatal trap 12: page fault while in kernel mode
cpunumber = 0
fault virtual addres	= 0xffc00034
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xf01d158f
stack pointer		= 0x10:0xefbffe94
frame pointer		= 0x10:0xefbffeb0
...
current process		= 419 (cc)
interrupt mask		=
kernel: type 12 trap, code=0
Stopped at	_pmap_enter+0x8f:	movl	0(%ecx),%ecx
db> trace
_pmap_enter(f2336a64,d000,1d34000,7,0) at _pmap_enter+0x8f
_vm_fault(f2336a00,d000,3,0,0) at _vm_fault+0xd0b
_trap_pfault(efbfffbc,1) at _trap_pfault+0xd4
_trap(27,27,0,efbfdbac,efbfdba4) at _trap+0x14b
calltrap() at calltrap+0x1a
--- trap 12, eip = 0x1048, ebp = 0xefbfdba4 ---
--- curproc = 0xf22f6e00, pid = 419 ---
---------------------------(end DDB stuff)--------------------------------

I think it is not just TLB shootdown issues, for 2 reasons:

 (1) I tried using the "examine" command for the virtual address listed
     in the error, and it gave me another "page fault in kernel mode"
     error, and
 (2) I implemented a wait for all other CPUs after the TLB shootdown messages
     were sent, plus placing a *long* wait afterward for paranoia.  This
     gave exactly the same results.

The two above points lead me to believe that: (a) it is not a TLB shootdown
issue in the sense that simply having a better rendevous procedure to
make sure the TLB shootdowns all happen before the sending CPU proceeds
would solve it, and (b) it looks like it might really be a problem in
the code which sets up the kernel pmaps which point to the user-level
pmaps, since I'm getting consistent page faults when accessing the
page tables of the user-level process.

I'll continue to look into it later today...

--
  Erich Stefan Boleyn                 \_ E-mail (preferred):  <erich@uruk.org>
Mad Genius wanna-be, CyberMuffin        \__      (finger me for other stats)
Web:  http://www.uruk.org/~erich/     Motto: "I'll live forever or die trying"