From owner-freebsd-hackers Wed Mar 10 10:19:14 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from psv.oss.uswest.net (psv.oss.uswest.net [204.147.85.6]) by hub.freebsd.org (Postfix) with ESMTP id F38E314EE8 for ; Wed, 10 Mar 1999 10:19:12 -0800 (PST) (envelope-from greg@psv.oss.uswest.net) Received: (from greg@localhost) by psv.oss.uswest.net (8.9.2/8.9.2) id MAA54775 for freebsd-hackers@FreeBSD.ORG; Wed, 10 Mar 1999 12:18:51 -0600 (CST) (envelope-from greg) Message-ID: X-Mailer: XFMail 1.3 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Wed, 10 Mar 1999 12:18:51 -0600 (CST) Reply-To: greg@uswest.net Organization: US WEST !NTERACT From: Greg Rowe To: freebsd-hackers@FreeBSD.ORG Subject: SMP Woes Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'm still struggling to get this last batch of SMP systems working correctly. Originally I was having Fatal Trap errors under high IO load on my Tyan dual CPU systems but somewhere close to 3.1 Release level, the problems cleared up and they seem to be running fine. Unfortunately, around the same build level my Quad Xeon box started up with the Fatal Traps ??? It's starting to drive me crazy !!! As far as I can tell, 3.0 Release worked fine on all the SMP systems here. Somewhere between 3.0 Release and 3.1 Release something changed that's causing these failures. I went up to 4.0 current yesterday to see if Matt's recent changes had any effect, but they didn't. I only get the failures when running in SMP mode on the system. Single CPU works fine. The failure is easily reproducable by running a couple cpio's or even a make world (I have to go to single CPU to upgrade). The Xeon is a SC450NX with a SCSI backplane. I've tried both the on-board NCR and Adaptec controllers. The DDB output is always the same, except the CPU ID changes. Again, it works at 3.0 Release, so I don't think it's a hardware failure. Attached is the DDB output and trace from a terminal server window. I'm more than willing to try any suggestions and can provide access to crash dumps or the system. Thanks. Fatal trap 12: page fault while in kernel mode mp_lock = 03000002; cpuid = 3; lapic.id = 02000000 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x8:0xf020ec9f stack pointer = 0x10:0xfe5a3c34 frame pointer = 0x10:0xfe5a3c58 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 245 (cpio) interrupt mask = net tty bio cam <- SMP: XXX kernel: type 12 trap, code=0 Stopped at generic_bzero+0xf: repe stosl %es:(%edi) db> trace generic_bzero(f3283f80,0,f4801900,fe5a3c90,fe5a3c98) at generic_bzero+0xf zalloci(f3283f80,f488bb00,f4801900,6f802,fe541180) at zalloci+0x29 getnewvnode(1,f33f0200,f3266200,fe5a3cfc,100) at getnewvnode+0x2f8 ffs_vget(f33f0200,6f802,fe5a3d7c,ff779d00,fe5a3edc) at ffs_vget+0xa5 ufs_lookup(fe5a3dd4,fe5a3de8,f016f6d4,fe5a3dd4,fe553c1d) at ufs_lookup+0x936 ufs_vnoperate(fe5a3dd4,fe553c1d,ff779d00,fe5a3edc,0) at ufs_vnoperate+0x15 vfs_cache_lookup(fe5a3e30,fe5a3e40,f0171ae9,fe5a3e30,fe53b640) at vfs_cache_lookup+0x248 ufs_vnoperate(fe5a3e30,fe53b640,fe5a3edc,fe5a3eb8,0) at ufs_vnoperate+0x15 lookup(fe5a3eb8,fe541180,f0250848,fe541180,1) at lookup+0x2c1 namei(fe5a3eb8,fe541180,f0250848,0,8057000) at namei+0x133 lstat(fe541180,fe5a3f94,8057000,ffffffff,3) at lstat+0x44 syscall(2f,efbf002f,3,ffffffff,efbfdc70) at syscall+0x187 Xint0x80_syscall() at Xint0x80_syscall+0x4c db> Greg Rowe US WEST - Internet Service Operations To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message