From owner-freebsd-amd64@FreeBSD.ORG Tue Jul 22 15:30:02 2008 Return-Path: Delivered-To: freebsd-amd64@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D377A1065949 for ; Tue, 22 Jul 2008 15:30:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D0FEE8FC30 for ; Tue, 22 Jul 2008 15:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m6MFU1E6016497 for ; Tue, 22 Jul 2008 15:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m6MFU1qk016496; Tue, 22 Jul 2008 15:30:01 GMT (envelope-from gnats) Resent-Date: Tue, 22 Jul 2008 15:30:01 GMT Resent-Message-Id: <200807221530.m6MFU1qk016496@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-amd64@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Sean Cody Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1B401065683 for ; Tue, 22 Jul 2008 15:29:29 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 9FB478FC16 for ; Tue, 22 Jul 2008 15:29:29 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m6MFTSJ0069252 for ; Tue, 22 Jul 2008 15:29:28 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m6MFTShQ069251; Tue, 22 Jul 2008 15:29:28 GMT (envelope-from nobody) Message-Id: <200807221529.m6MFTShQ069251@www.freebsd.org> Date: Tue, 22 Jul 2008 15:29:28 GMT From: Sean Cody To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 X-Mailman-Approved-At: Tue, 22 Jul 2008 16:01:35 +0000 Cc: Subject: amd64/125873: Repeated kernel panics, trap 12 page fault while in kernel mode (always with smbd). X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jul 2008 15:30:03 -0000 >Number: 125873 >Category: amd64 >Synopsis: Repeated kernel panics, trap 12 page fault while in kernel mode (always with smbd). >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-amd64 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jul 22 15:30:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Sean Cody >Release: 7.0-RELEASE >Organization: Frantic Films VFX Services Inc. >Environment: FreeBSD deadline-la.franticfilms.com 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >Description: Machine panics almost daily under heavy Samba usage. We have a machine which we recently converted to a FreeBSD 7 Box whose sole purpose in existence is to deal with a product which communicates over a disk based queue served up via CIFS to clients. This machine is pretty heavily loaded with these requests and shortly after putting the machine into production the machine would crash daily (sometimes more than once) with the very same characteristics. We've swapped the drives to another machine with the same results. The panic drops a number for cores and each one's backtrace shows corrupted stack. We've not tried booting without ACPI support though I have ruled out memory and temperature issues by swapping hardware around and keeping an eye on the environment. Here is a simple back trace from the panic's core. deadline-la# cat info.9 Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 170438656B (162 MB) Blocksize: 512 Dumptime: Fri Jul 18 17:25:32 2008 Hostname: deadline-la.franticfilms.com Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Panic String: page fault Dump Parity: 113022818 Bounds: 9 Dump Status: good deadline-la# kgdb /boot/kernel/kernel /var/crash/vmcore.9 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff80468dff stack pointer = 0x10:0xffffffffa2c339c0 frame pointer = 0x10:0xffffff0013e4f8d0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 34088 (smbd) trap number = 12 panic: page fault cpuid = 0 Uptime: 8h13m17s Physical memory: 1011 MB Dumping 162 MB: 147 131 115 99 83 67 51 35 19 3 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff80477699 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff80477a9d in panic (fmt=0x104
) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0xffffffff8072ec94 in trap_fatal (frame=0xffffff00152b9000, eva=18446742974531694592) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff8072f90f in trap (frame=0xffffffffa2c33910) at /usr/src/sys/amd64/amd64/trap.c:251 #6 0xffffffff8071560e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #7 0xffffffff80468dff in lf_advlock (ap=Variable "ap" is not available. ) at /usr/src/sys/kern/kern_lockf.c:294 #8 0xffffffff8044ec5b in kern_fcntl (td=0xffffff00152b9000, fd=Variable "fd" is not available. ) at vnode_if.h:1036 #9 0xffffffff8044f01f in fcntl (td=0xffffff00152b9000, uap=0xffffffffa2c33be0) at /usr/src/sys/kern/kern_descrip.c:336 #10 0xffffffff8072f2e7 in syscall (frame=0xffffffffa2c33c70) at /usr/src/sys/amd64/amd64/trap.c:852 #11 0xffffffff8071581b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #12 0x0000000801af707c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) dmesg of the machine: Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff Features2=0x641d AMD Features=0x20000800 usable memory = 1060724736 (1011 MB) avail memory = 1022083072 (974 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 6 ioapic0: Changing APIC ID to 7 ioapic1: Changing APIC ID to 8 ioapic2: Changing APIC ID to 9 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 32-55 on motherboard ioapic2 irqs 64-87 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 10:34:18) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 cpu0: on acpi0 p4tcc0: on cpu0 cpu1: on acpi0 p4tcc1: on cpu1 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci1: on pcib1 pcib2: at device 0.0 on pci1 pci2: on pcib2 pcib3: at device 0.2 on pci1 pci3: on pcib3 mpt0: port 0xec00-0xecff mem 0xdfdf0000-0xdfdfffff,0xdfde0000-0xdfdeffff irq 34 at device 5.0 on pci3 mpt0: [ITHREAD] mpt0: MPI Version=1.2.12.0 pcib4: at device 4.0 on pci0 pci4: on pcib4 pcib5: at device 5.0 on pci0 pci5: on pcib5 pcib6: at device 0.0 on pci5 pci6: on pcib6 em0: port 0xdcc0-0xdcff mem 0xdfae0000-0xdfafffff irq 64 at device 7.0 on pci6 em0: Ethernet address: 00:11:43:d7:00:ac em0: [FILTER] pcib7: at device 0.2 on pci5 pci7: on pcib7 em1: port 0xccc0-0xccff mem 0xdf8e0000-0xdf8fffff irq 65 at device 8.0 on pci7 em1: Ethernet address: 00:11:43:d7:00:ad em1: [FILTER] pcib8: at device 6.0 on pci0 pci8: on pcib8 uhci0: port 0xace0-0xacff irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xacc0-0xacdf irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xaca0-0xacbf irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xdff00000-0xdff003ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: on usb3 uhub3: 6 ports with 6 removable, self powered uhub4: on uhub3 uhub4: multiple transaction translators uhub4: 2 ports with 2 removable, self powered pcib9: at device 30.0 on pci0 pci9: on pcib9 vgapci0: port 0xbc00-0xbcff mem 0xd0000000-0xd7ffffff,0xdf5f0000-0xdf5fffff irq 18 at device 13.0 on pci9 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse Explorer, device ID 4 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] orm0: at iomem 0xc0000-0xcafff,0xcb000-0xcbfff,0xcc000-0xcffff,0xd0000-0xd0fff,0xec000-0xeffff on isa0 ppc0: cannot reserve I/O port range sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec hptrr: no controller detected. Waiting 5 seconds for SCSI devices to settle acd0: CDROM at ata0-master UDMA33 ses0 at mpt0 bus 0 target 6 lun 0 ses0: Fixed Processor SCSI-2 device ses0: 3.300MB/s transfers ses0: SAF-TE Compliant Device SMP: AP CPU #1 Launched! da0 at mpt0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers (160.000MHz DT, offset 127, 16bit) da0: Command Queueing Enabled da0: 70007MB (143374650 512 byte sectors: 255H 63S/T 8924C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted em0: link state changed to UP rtfree: 0xffffff00017b14b0 has 1 refs rtfree: 0xffffff000c3520f0 has 1 refs I've got 10 of these cores and all show very similar back traces and I'm not sure what to try next (save for disabling ACPI which may not be the problem). >How-To-Repeat: Just let the machine run and serve up a heavy load of SMB traffic. >Fix: >Release-Note: >Audit-Trail: >Unformatted: