From owner-freebsd-questions@FreeBSD.ORG Fri May 5 15:14:12 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E9F5D16A414 for ; Fri, 5 May 2006 15:14:12 +0000 (UTC) (envelope-from nwood@prohosting.com) Received: from mail-da-6.dns-solutions.net (mail-da-6.dns-solutions.net [69.12.124.4]) by mx1.FreeBSD.org (Postfix) with SMTP id 88E2E43D62 for ; Fri, 5 May 2006 15:14:12 +0000 (GMT) (envelope-from nwood@prohosting.com) Received: (qmail 24232 invoked from network); 5 May 2006 15:14:09 -0000 Received: from unknown (HELO NL.prohosting.com) (nwood@prohosting.com@166.70.238.202) by mail-da-6.dns-solutions.net - 166.70.238.202 with SMTP; 5 May 2006 15:14:09 -0000 Message-Id: <7.0.1.0.0.20060505081022.0405e710@prohosting.com> X-Mailer: QUALCOMM Windows Eudora Version 7.0.1.0 Date: Fri, 05 May 2006 09:14:04 -0600 To: freebsd-questions@freebsd.org From: Nick Wood Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: panic: page fault - 6.0-RELEASE-p7 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 May 2006 15:14:22 -0000 Hello, We have a group of web and mail servers that run under a moderate load. We recently upgraded them from 4/5.x to 6.0. While we thought we had done enough testing, apparently we hadn't and are now experiencing panic's on a number of the servers. Some of our more heavily loaded servers have been fine for days, while others will crash every 6 to 36 hours. Below are some pieces of information that may be helpful. Should I be posting this to another list as well? I know I can decrease NMBCLUSTERS dramatically, and give more memory to the kernel if that would help. I've read a number of similar cases where this panic was related to a hardware failure, and while I can't rule that out completely, it does seem unusual that several servers are apparently having the same problem. Could it be that hardware problems existed before the upgrade, but are now brought out by the increased load caused by the new OS version and other installed software? We have IPMI cards in some of the crashing servers and they all report normal temperatures, fan speeds, and voltages. Nothing unusual in the event logs. I'm willing to dig deeper and do more testing if anyone has suggestions. Differences from GENERIC: ---------------------------------------------- #cpu I486_CPU #cpu I586_CPU cpu I686_CPU ident PAYMAIL options SUIDDIR options QUOTA options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=10 options NMBCLUSTERS=65536 options KVA_PAGES="640" options VM_KMEM_SIZE_MAX=(512*1048576) options VM_KMEM_SIZE_SCALE=2 options ASR_COMPAT options SHMMAXPGS=131072 options SEMMNI=128 options SEMMNS=512 options SEMUME=100 options SEMMNU=256 ---------------------------------------------- ---------------------------------------------- mail-da-2# kgdb /boot/kernel/kernel.debug vmcore.2 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: ber = 12 panic: page fault Uptime: 1d6h4m36s Dumping 2047 MB (3 chunks) chunk 0: 1MB (158 pages) ... ok chunk 1: 2046MB (523773 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 ... ok chunk 2: 1MB (128 pages) #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:165 #1 0x606384aa in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x60638740 in panic (fmt=0x6085598b "%s") at /usr/src/sys/kern/kern_shutdown.c:555 #3 0x6080ebf8 in trap_fatal (frame=0x9c497ad8, eva=172) at /usr/src/sys/i386/i386/trap.c:831 #4 0x6080e963 in trap_pfault (frame=0x9c497ad8, usermode=0, eva=172) at /usr/src/sys/i386/i386/trap.c:742 #5 0x6080e5c1 in trap (frame= {tf_fs = 1692663816, tf_es = 1680080936, tf_ds = 40, tf_edi = 55, tf_esi = 0, tf_ebp = -1672905932, tf_isp = -1672905980, tf_ebx = -1672905584, tf_edx = 1677080448, tf_ecx = 0, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = 1617791092, tf_cs = 32, tf_eflags = 66182, tf_esp = 1773435648, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:432 #6 0x607fe6aa in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0x606d8874 in ip_ctloutput (so=0x4, sopt=0x9c497c90) at atomic.h:146 #8 0x606e88ef in tcp_ctloutput (so=0x64e419bc, sopt=0x9c497c90) at /usr/src/sys/netinet/tcp_usrreq.c:1036 #9 0x60671c00 in sosetopt (so=0x64e419bc, sopt=0x9c497c90) at /usr/src/sys/kern/uipc_socket.c:1553 #10 0x60676e5d in kern_setsockopt (td=0x63f63780, s=0, level=4, name=4, val=0x63f63780, valseg=UIO_USERSPACE, valsize=0) at /usr/src/sys/kern/uipc_syscalls.c:1331 #11 0x60676d8e in setsockopt (td=0x63f63780, uap=0x4) at /usr/src/sys/kern/uipc_syscalls.c:1287 #12 0x6080ef0f in syscall (frame= {tf_fs = 1606352955, tf_es = 59, tf_ds = 1606352955, tf_edi = 1606413432, tf_esi = 3, tf_ebp = 1606413224, tf_isp = -1672905372, tf_ebx = 0, tf_edx = 2, tf_ecx = 134545464, tf_eax = 105, tf_trapno = 12, tf_err = 2, tf_eip = 671862739, tf_cs = 51, tf_eflags = 514, tf_esp = 1606413180, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976 #13 0x607fe6ff in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #14 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) ---------------------------------------------- ---------------------------------------------- Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-RELEASE-p7 #0: Wed May 3 22:45:52 MDT 2006 root@mail-da-2...:/usr/obj/usr/src/sys/LOCAL MPTable: < Kings Canyon> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2399.33-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 Features=0xbfebfbff Features2=0x4400> Hyperthreading: 2 logical CPUs real memory = 2146959360 (2047 MB) avail memory = 2094030848 (1997 MB) ioapic0: Assuming intbase of 0 ioapic1: Assuming intbase of 24 ioapic2: Assuming intbase of 48 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface cpu0 on motherboard pcib0: pcibus 0 on motherboard pci0: on pcib0 pci0: at device 0.1 (no driver attached) pcib1: at device 2.0 on pci0 pci1: on pcib1 pci1: at device 28.0 (no driver attached) pcib2: at device 29.0 on pci1 pci2: on pcib2 em0: port 0x3000-0x303f mem 0xf8200000-0xf821ffff irq 54 at device 3.0 on pci2 em0: Ethernet address: 00:30:48:27:61:76 em0: Speed:N/A Duplex:N/A em1: port 0x3040-0x307f mem 0xf8220000-0xf823ffff irq 55 at device 3.1 on pci2 em1: Ethernet address: 00:30:48:27:61:77 em1: Speed:N/A Duplex:N/A pci1: at device 30.0 (no driver attached) pcib3: at device 31.0 on pci1 pci3: on pcib3 asr0: mem 0xf8300000-0xf83fffff,0xfb000000-0xfbffffff,0xfc000000-0xfdffffff irq 30 at device 3.0 on pci3 asr0: [GIANT-LOCKED] asr0: ADAPTEC 2015S FW Rev. 3B05, 2 channel, 256 CCBs, Protocol I2O uhci0: port 0x2000-0x201f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x2020-0x203f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0x2040-0x205f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pcib4: at device 30.0 on pci0 pci4: on pcib4 pci4: at device 1.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2060-0x206f at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 pci0: at device 31.3 (no driver attached) pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xcefff,0xdc000-0xdffff,0xe0000-0xe3fff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (memory) unknown: can't assign resources (memory) unknown: can't assign resources (port) unknown: can't assign resources (port) unknown: can't assign resources (port) Timecounter "TSC" frequency 2399331312 Hz quality 800 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging limited to 10 packets/entry by default acd0: CDROM at ata1-master UDMA33 ses0 at asr0 bus 0 target 6 lun 0 ses0: Fixed Processor SCSI-2 device ses0: SAF-TE Compliant Device da0 at asr0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: Tagged Queueing Enabled da0: 34732MB (71132942 512 byte sectors: 255H 63S/T 4427C) da1 at asr0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: Tagged Queueing Enabled da1: 140014MB (286748672 512 byte sectors: 255H 63S/T 17849C) ---------------------------------------------- Thanks, Nick Wood