From owner-freebsd-smp Sun Dec 22 09:36:48 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id JAA13569 for smp-outgoing; Sun, 22 Dec 1996 09:36:48 -0800 (PST) Received: from uruk.org (root@faustus.dev.com [198.145.95.253]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id JAA13557 for ; Sun, 22 Dec 1996 09:36:43 -0800 (PST) Received: from uruk.org [127.0.0.1] (erich) by uruk.org with esmtp (Exim 0.53 #1) id E0vbsml-00080n-00; Sun, 22 Dec 1996 10:38:07 -0800 To: smp@freebsd.org Subject: (long) P6 and ??? TLB shootdown ??? Date: Sun, 22 Dec 1996 10:38:07 -0800 From: Erich Boleyn Message-Id: Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi all. I spent the last few days doing debugging exercises with FreeBSD-SMP on my P6 SMP test box. The results were interesting. First of all, I dug around in the debugger more, always getting an error message and stack traceback that always looks like the following (modulo some differences in the "fault virtual address" and the "current process" stuff): ---------------------------(start DDB stuff)-------------------------------- Fatal trap 12: page fault while in kernel mode cpunumber = 0 fault virtual addres = 0xffc00034 fault code = supervisor read, page not present instruction pointer = 0x8:0xf01d158f stack pointer = 0x10:0xefbffe94 frame pointer = 0x10:0xefbffeb0 ... current process = 419 (cc) interrupt mask = kernel: type 12 trap, code=0 Stopped at _pmap_enter+0x8f: movl 0(%ecx),%ecx db> trace _pmap_enter(f2336a64,d000,1d34000,7,0) at _pmap_enter+0x8f _vm_fault(f2336a00,d000,3,0,0) at _vm_fault+0xd0b _trap_pfault(efbfffbc,1) at _trap_pfault+0xd4 _trap(27,27,0,efbfdbac,efbfdba4) at _trap+0x14b calltrap() at calltrap+0x1a --- trap 12, eip = 0x1048, ebp = 0xefbfdba4 --- --- curproc = 0xf22f6e00, pid = 419 --- ---------------------------(end DDB stuff)-------------------------------- I think it is not just TLB shootdown issues, for 2 reasons: (1) I tried using the "examine" command for the virtual address listed in the error, and it gave me another "page fault in kernel mode" error, and (2) I implemented a wait for all other CPUs after the TLB shootdown messages were sent, plus placing a *long* wait afterward for paranoia. This gave exactly the same results. The two above points lead me to believe that: (a) it is not a TLB shootdown issue in the sense that simply having a better rendevous procedure to make sure the TLB shootdowns all happen before the sending CPU proceeds would solve it, and (b) it looks like it might really be a problem in the code which sets up the kernel pmaps which point to the user-level pmaps, since I'm getting consistent page faults when accessing the page tables of the user-level process. I'll continue to look into it later today... -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying"