From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 15 20:46:38 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B748716A4CE for ; Tue, 15 Jun 2004 20:46:38 +0000 (GMT) Received: from event-horizon.royalcomp.hu (event-horizon.royalcomp.hu [195.70.35.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0024A43D45 for ; Tue, 15 Jun 2004 20:46:37 +0000 (GMT) (envelope-from ice@wormhole.hu) Received: from border.royalcomp.hu ([195.70.42.158]) by event-horizon.royalcomp.hu with esmtp (Exim 3.36 #1 (Debian)) id 1BaKe5-0005Kh-00 for ; Tue, 15 Jun 2004 22:35:01 +0200 Received: from dawn.royalcomp.hu ([195.70.42.152]) by border.royalcomp.hu with esmtp (Exim 3.33 #1 (Debian)) id 1BaKYs-0006VX-00 for ; Tue, 15 Jun 2004 22:29:38 +0200 Date: Tue, 15 Jun 2004 22:34:57 +0200 (CEST) From: Tamas TEVESZ To: freebsd-hackers@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: 4.10-RELEASE and -STABLE crashing regularly under load X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jun 2004 20:46:38 -0000 hi folks, [i've posted the following message to -bugs@ a while ago, but then was directed here by a friend freebsder. while reposting, i also corrected some minor facts i missed in the previous post] i have a dell poweredge 2600 (4G ram, 2x2.8ghz xeon cpus, some disk, full dmesg below), running a heavily loaded website (apache13, php, cgi, pure-ftpd). this is a brand new 4.10-release install, brought to sync with -stable, both exhibit the exact same problem), which every once in a while crashes badly. 4.10-R did that every ~2.5 days, 4.10-S did it for the first time after one day. (before that, system was running 4.9-stable on a poweredge 4600 with one xeon cpu, no ht, no smp, no nothing like that, and was very stable). i cannot entirely rule out bad hardware as this is a brand new system, but we haven't had many problems with dell stuff before. everything i think to be related is included below; if anything else is needed just please tell so. thanks in advance. misc related information: ================================== # sysctl machdep.hlt_logical_cpus machdep.hlt_logical_cpus: 1 # kldstat Id Refs Address Size Name 1 2 0xc0100000 1e5214 kernel 2 1 0xc02e6000 21c4 accf_http.ko # information from gdb: ================================== # gdb -k kernel.debug.1 vmcore.1 GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf SMP 4 cpus IdlePTD at physical address 0x00309000 initial pcb at physical address 0x0027b3c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 02000002; cpuid = 2; lapic.id = 06000000 fault virtual address = 0xbfc00000 fault code = supervisor write, page not present instruction pointer = 0x8:0xc0213fd9 stack pointer = 0x10:0xfc749e04 frame pointer = 0x10:0xfc749e10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 73408 (grep) interrupt mask = none <- SMP: XXX trap number = 12 panic: page fault mp_lock = 02000002; cpuid = 2; lapic.id = 06000000 boot() called on cpu#2 syncing disks... 45 7 done Uptime: 1d0h14m54s amr0: flushing cache...done amr1: flushing cache...done dumping to dev #amrd/0x20001, offset 1048960 dump 3583 3582 3581 3580 3579 3578 3577 3576 3575 3574 3573 3572 3571 [...] #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487 487 if (dumping++) { (kgdb) bt #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487 #1 0xc01664df in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:316 #2 0xc0166938 in poweroff_wait (junk=0xc024ff79, howto=-1071318481) at /usr/src/sys/kern/kern_shutdown.c:595 #3 0xc0217e98 in trap_fatal (frame=0xfc749dc4, eva=3217031168) at /usr/src/sys/i386/i386/trap.c:974 #4 0xc0217b29 in trap_pfault (frame=0xfc749dc4, usermode=0, eva=3217031168) at /usr/src/sys/i386/i386/trap.c:867 #5 0xc02176c7 in trap (frame={tf_fs = 24, tf_es = -68485104, tf_ds = 134610960, tf_edi = -99396280, tf_esi = 0, tf_ebp = -59466224, tf_isp = -59466256, tf_ebx = 3, tf_edx = -1043777528, tf_ecx = 0, tf_eax = 1245573123, tf_trapno = 12, tf_err = 2, tf_eip = -1071562791, tf_cs = 8, tf_eflags = 66054, tf_esp = 134660096, tf_ss = 134660096}) at /usr/src/sys/i386/i386/trap.c:466 #6 0xc0213fd9 in pmap_qenter (va=0, m=0xfa135548, count=4) at /usr/src/sys/i386/i386/pmap.c:848 #7 0xc017711a in pipe_build_write_buffer (wpipe=0xfa135520, uio=0xfc749ed0) at /usr/src/sys/kern/sys_pipe.c:594 #8 0xc01772e0 in pipe_direct_write (wpipe=0xfa135520, uio=0xfc749ed0) at /usr/src/sys/kern/sys_pipe.c:709 #9 0xc0177682 in pipe_write (fp=0xce43cec0, uio=0xfc749ed0, cred=0xcd0c6080, flags=0, p=0xfbeb3ee0) at /usr/src/sys/kern/sys_pipe.c:827 #10 0xc0175a05 in dofilewrite (p=0xfbeb3ee0, fp=0xce43cec0, fd=1, buf=0x8068000, nbyte=16384, offset=-1, flags=0) at /usr/src/sys/sys/file.h:163 #11 0xc01758be in write (p=0xfbeb3ee0, uap=0xfc749f80) at /usr/src/sys/kern/sys_generic.c:329 #12 0xc02181c9 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134643712, tf_esi = 672187864, tf_ebp = -1077937456, tf_isp = -59465772, tf_ebx = 672188332, tf_edx = 672187864, tf_ecx = 0, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = 672141560, tf_cs = 31, tf_eflags = 663, tf_esp = -1077937500, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1175 #13 0xc02056fb in Xint0x80_syscall () #14 0x280fedd2 in ?? () #15 0x280fed41 in ?? () #16 0x280fbc26 in ?? () #17 0x280a50d5 in ?? () #18 0x804ec04 in ?? () #19 0x804edc6 in ?? () #20 0x804eec5 in ?? () #21 0x804f0c7 in ?? () #22 0x804f3a4 in ?? () #23 0x80500f3 in ?? () #24 0x8049046 in ?? () (kgdb) list *0xc0213fd9 0xc0213fd9 is in pmap_qenter (/usr/src/sys/i386/i386/pmap.c:848). 843 void 844 pmap_qenter(vm_offset_t va, vm_page_t *m, int count) 845 { 846 while (count-- > 0) { 847 pt_entry_t *pte = vtopte(va); 848 *pte = VM_PAGE_TO_PHYS(*m) | PG_RW | PG_V | pgeflag; 849 #ifdef SMP 850 cpu_invlpg((void *)va); 851 #else 852 invltlb_1pg(va); (kgdb) dmesg: ================================== Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.10-STABLE #3: Mon Jun 14 12:39:29 CEST 2004 root@mammut.swi.hu:/usr/obj/usr/src/sys/MAMMUT Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2791.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 3757899776 (3669824K bytes) avail memory = 3660173312 (3574388K bytes) Changing APIC ID for IO APIC #0 from 0 to 8 on chip Changing APIC ID for IO APIC #1 from 0 to 9 on chip Changing APIC ID for IO APIC #2 from 0 to 10 on chip Changing APIC ID for IO APIC #3 from 0 to 11 on chip Changing APIC ID for IO APIC #4 from 0 to 12 on chip Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 24 pins in IOAPIC #1 Programming 24 pins in IOAPIC #2 Programming 24 pins in IOAPIC #3 Programming 24 pins in IOAPIC #4 FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 io0 (APIC): apic id: 8, version: 0x00178020, at 0xfec00000 io1 (APIC): apic id: 9, version: 0x00178020, at 0xfec80000 io2 (APIC): apic id: 10, version: 0x00178020, at 0xfec81000 io3 (APIC): apic id: 11, version: 0x00178020, at 0xfec82000 io4 (APIC): apic id: 12, version: 0x00178020, at 0xfec82800 Preloaded elf kernel "kernel" at 0xc02ea000. Preloaded elf module "accf_http.ko" at 0xc02ea09c. Warning: Pentium 4 CPU: PSE disabled module_register_init: MOD_LOAD (accf_http, c0180dc0, 0xc02e7a60) error 17 Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 12 entries at 0xc00fc160 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #0 intpin 16 -> irq 2 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci1: on pcib1 pci1: (vendor=0x8086, dev=0x1461) at 28.0 pcib2: at device 29.0 on pci1 pci2: on pcib2 pci1: (vendor=0x8086, dev=0x1461) at 30.0 pcib3: at device 31.0 on pci1 IOAPIC #1 intpin 4 -> irq 5 pci3: on pcib3 em0: port 0xece0-0xecff mem 0xfdcc0000-0xfdcdffff,0xfdce0000-0xfdcfffff irq 5 at device 1.0 on pci3 em0: Speed:N/A Duplex:N/A pcib4: at device 3.0 on pci0 pci4: on pcib4 pci4: (vendor=0x8086, dev=0x1461) at 28.0 pcib5: at device 29.0 on pci4 pci5: on pcib5 pci4: (vendor=0x8086, dev=0x1461) at 30.0 pcib6: at device 31.0 on pci4 pci6: on pcib6 pcib7: at device 4.0 on pci0 pci7: on pcib7 pci7: (vendor=0x8086, dev=0x1461) at 28.0 pcib8: at device 29.0 on pci7 IOAPIC #3 intpin 0 -> irq 7 pci8: on pcib8 amr0: mem 0xf7ff0000-0xf7ffffff irq 7 at device 8.0 on pci8 amr0: Firmware 2.48, BIOS 1.06, 128MB RAM pci7: (vendor=0x8086, dev=0x1461) at 30.0 pcib9: at device 31.0 on pci7 pci10: on pcib9 pcib10: at device 6.0 on pci10 IOAPIC #4 intpin 1 -> irq 11 pci11: on pcib10 pcib11: at device 0.0 on pci11 IOAPIC #4 intpin 0 -> irq 13 pci12: on pcib11 amr1: mem 0xe8000000-0xefffffff irq 13 at device 0.0 on pci12 amr1: Firmware 1.80, BIOS 3.29, 128MB RAM pci11: (vendor=0x1077, dev=0x1216) at 1.0 irq 11 pci0: at 29.0 irq 2 pcib12: at device 30.0 on pci0 pci13: on pcib12 pci13: at 4.0 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 irq 2 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 orm0: