From owner-freebsd-stable@FreeBSD.ORG Tue Jul 7 09:51:08 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 889651065670 for ; Tue, 7 Jul 2009 09:51:08 +0000 (UTC) (envelope-from ianjhart@ntlworld.com) Received: from mtaout02-winn.ispmail.ntl.com (mtaout02-winn.ispmail.ntl.com [81.103.221.48]) by mx1.freebsd.org (Postfix) with ESMTP id 053678FC1C for ; Tue, 7 Jul 2009 09:51:07 +0000 (UTC) (envelope-from ianjhart@ntlworld.com) Received: from aamtaout01-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP id <20090707095106.BDIY6611.mtaout02-winn.ispmail.ntl.com@aamtaout01-winn.ispmail.ntl.com> for ; Tue, 7 Jul 2009 10:51:06 +0100 Received: from cpc1-cove3-0-0-cust909.sol2.cable.ntl.com ([86.20.31.142]) by aamtaout01-winn.ispmail.ntl.com (InterMail vG.2.02.00.01 201-2161-120-102-20060912) with ESMTP id <20090707095106.RMWF13254.aamtaout01-winn.ispmail.ntl.com@cpc1-cove3-0-0-cust909.sol2.cable.ntl.com> for ; Tue, 7 Jul 2009 10:51:06 +0100 X-Virus-Scanned: amavisd-new at cpc2-cove3-0-0-cust311.sol2.cable.ntl.com Received: from localhost (localhost [127.0.0.1]) by cpc1-cove3-0-0-cust909.sol2.cable.ntl.com (8.14.3/8.14.3) with ESMTP id n679p3eK031360 for ; Tue, 7 Jul 2009 10:51:03 +0100 (BST) (envelope-from ianjhart@cpc1-cove3-0-0-cust909.sol2.cable.ntl.com) Received: from localhost (localhost [127.0.0.1]) by 10.248.192.16 (Horde Framework) with HTTP; Tue, 07 Jul 2009 10:51:03 +0100 Message-ID: <20090707105103.946813hdks2mra80@10.248.192.16> Date: Tue, 07 Jul 2009 10:51:03 +0100 From: Ian J Hart To: freebsd-stable@freebsd.org References: <20090703100627.197838cphjnil82s@10.248.192.16> <20090706200115.1411150frxepkbuo@webmail.private.lan> In-Reply-To: <20090706200115.1411150frxepkbuo@webmail.private.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) 4.3.3 / FreeBSD-7.2 X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on cpc1-cove3-0-0-cust909.sol2.cable.ntl.com X-Cloudmark-Analysis: v=1.0 c=1 a=NLZqzBF-AAAA:8 a=SqJks5JA5r34FEmSh80A:9 a=QNahSL-9qi7wLYiZnnYA:7 a=EQ2J-qoA6lNrSCd3ukVrfPpApugA:4 a=_dQi-Dcv4p4A:10 a=YVatAntiOeEYisNx:21 a=sA_10-0IDPlEuGJD:21 Subject: Re: trap 12 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2009 09:51:08 -0000 Quoting Ian J Hart : > Quoting Ian J Hart : > >> Is this likely to be hardware? Details will follow if not. >> >> [copied from a screen dump] >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 1; apic id = 01 >> fault virtual address = 0x0 >> fault code = supervisor write data, page not present >> instruction pointer = 0x8:0xffffffff807c6c12 >> stack pointer = 0x10:0xffffffff510e7890 >> frame pointer = 0x10:0xffffff00054a6c90 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1 def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 75372 (printf) >> trap number = 12 >> panic: page fault >> cpuid = 1 >> uptime: 8m2s >> Cannot dump. No dump device defined. >> >> > > [First attempt apparently went into a blackhole. Apologies in you > get this twice.] > > Some suggestions (off list) that it may not be hardware, so here's > the follow up. > > supermicro 5015b-mt (super X7SBi mobo) > Intel Q6600 > 8GB ECC DDR2 > 4x Seagate 320GB, two gmirror, two idle. > > issues so far > > 1 OK) 7.x doesn't boot without hw.ata.atapi_dma=0. Not recently tested. > 2 OK) disks enumerate differently 6.x to 7.x. Painful if you > hardwired the providor into your mirror. > 3) 6.3 and 7.2 remote dump over ssh fails with 'Disconnecting: > Corrupted MAC on input.' > 4) On 7.2 (AFAICT from logs) random reboots under load. e.g. the > above generated by a portupgrade run. > > I had dumpdev=none as I hadn't setup rc.early to allow savecore to work. > > In the interests of full disclosure I should say that this box was > migrated from older hardware and then source upgraded from i386 to > amd64 (6.3). Only one issue with that, format of accounting > file.Upgrade to 7.2 and a rebuild or two since then. > > This box is our email server and there's no load. An identical box > running as a gateway/firewall backup dumps okay and doesn't reboot. > That box does drop network connections when running a cvsup server > (treelist write), but when configured to pass through these > connections (using balance) runs okay. But that's a story for > another day as it's still on 6.x. > > Anyway, I put the two gmirror disks in another chassis and the > remote dumps are now completing.This at least does seem to be > hardware. > > Before I moved the two gmirror disks I synced a third disk. I can > now test (most of) the original hardware and software. > > I was unable to make this single disk system crash, so I added two > new disks and synced them.Now a 3 disk mirror, one disk idle. > > I've disabled sendmail and the email server so as not to clash. > > A portupgrade run caused a crash. I've setup coredumps so I can now > test. Remote backup dumps do fail. > > xmail# kldstat > Id Refs Address Size Name > 1 2 0xffffffff80100000 bd23e0 kernel > 2 1 0xffffffff80cd3000 20608 geom_mirror.ko > > I did have ipfw module loaded, but I got the crash without it so > I've removed it (firewall_type=OPEN). > > Ran crashinfo, now have much more info than I need ;) > > Starting another portupgrade run now to see how reproducable this is. > > Later BIOS waiting in USB floppy. > [snip dmesg] It took 2 runs of portupgrade -af.Some corruption in the dbs may have to pkg_delete -a. FreeBSD * 7.2-RELEASE-p1 FreeBSD 7.2-RELEASE-p1 #0: Tue Jun 16 18:03:10 BST 2009 *@*:/usr/obj/usr/src/sys/GENERIC amd64 panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xfffffffff5555570 fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffffff807c429b stack pointer = 0x10:0xffffffff511e4710 frame pointer = 0x10:0x20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 69996 (mkdir) trap number = 12 panic: page fault cpuid = 1 Uptime: 19h16m41s Physical memory: 8177 MB Dumping 730 MB: 715 699 683 667 651 635 619 603 587 571 555 539 523 507 491 475 459 443 427 411 395 379 363 347 331 315 299 283 267 251 235 219 203 187 171 155 139 123 107 91 75 59 43 27 11 Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:195 #1 0x0000000000000004 in ?? () #2 0xffffffff8050df19 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff8050e322 in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:574 #4 0xffffffff807d21f3 in trap_fatal (frame=0xffffff0005f94a50, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:757 #5 0xffffffff807d25c5 in trap_pfault (frame=0xffffffff511e4660, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:673 #6 0xffffffff807d2f04 in trap (frame=0xffffffff511e4660) at /usr/src/sys/amd64/amd64/trap.c:444 #7 0xffffffff807b706e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209 #8 0xffffffff807c429b in free_pv_entry (pmap=0xffffffff80b66c80, pv=Variable "pv" is not available. ) at /usr/src/sys/amd64/amd64/pmap.c:1905 #9 0xffffffff807c4403 in pmap_remove_entry (pmap=Variable "pmap" is not available. ) at /usr/src/sys/amd64/amd64/pmap.c:2131 #10 0xffffffff807c6447 in pmap_remove_pte (pmap=0xffffffff80b66c80, ptq=0xaaaaaaa8, va=18446744070506639360, ptepde=23601251, free=0xffffffff511e4790) at /usr/src/sys/amd64/amd64/pmap.c:2366 #11 0xffffffff807cab87 in pmap_remove (pmap=0xffffffff80b66c80, sva=18446744070506639360, eva=18446744070506909696) at /usr/src/sys/amd64/amd64/pmap.c:2510 #12 0xffffffff8073bf80 in vm_map_delete (map=0xffffff00016830f8, start=18446744070506639360, end=18446744070506909696) at /usr/src/sys/vm/vm_map.c:2400 #13 0xffffffff80739905 in kmem_free_wakeup (map=0xffffff00016830f8, addr=18446744070506639360, size=267264) at /usr/src/sys/vm/vm_kern.c:462 #14 0xffffffff804e648d in exec_free_args (args=0xffffffff511e4b00) at /usr/src/sys/kern/kern_exec.c:1098 #15 0xffffffff804e784a in kern_execve (td=0xffffff0005f94a50, args=0xffffffff511e4b00, mac_p=Variable "mac_p" is not available. ) at /usr/src/sys/kern/kern_exec.c:836 #16 0xffffffff804e7fd7 in execve (td=0xffffff0005f94a50, uap=Variable "uap" is not available. ) at /usr/src/sys/kern/kern_exec.c:202 #17 0xffffffff807d2847 in syscall (frame=0xffffffff511e4c80) at /usr/src/sys/amd64/amd64/trap.c:900 #18 0xffffffff807b727b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:330 #19 0x00000008005044b0 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) -- ian j hart ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.