Date: Mon, 7 May 2007 12:32:38 -0400 From: "Pat Wendorf" <dungeons@gmail.com> To: freebsd-stable@freebsd.org Subject: HP DL-360 Kernel Crashes on 6.2-R Message-ID: <2c2c47aa0705070932h65ab47cfv33a583645f9e7a3c@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hey all, I'm running 6.2R-p4 and I'm having some terrible problems with stability on HP-DL360 G4 and G5 hardware. We use these systems mostly for high volume postfix mail servers, and under heavy postfix queue load, the server will crash (sometimes within minutes, sometimes 12-14 hours after being put in load). It also seems to also be met with some degree of file system corruption when the box comes backup. The first challenge I have is, I don't have local access to the box. These boxes are hosted with a managed hosting provider who does not understand FreeBSD at all (yikes). The second problem is, the type of crash seems to very rarely produce a crash dump under /var/crash. I'll provide the one crash dump I've managed to get below. To make things worse, this crash appears on 2 types of hardware (G4 and G5 boxes) and 3 types of raid controllers. We're currently deprecating the G4 hardware, so I'll just send the dmesg and crash dump from the G5, but I can assure you the symptoms are exactly the same on both types of boxes. I'd also mention we've been using the exact same G4 hardware with 6.1-p11 in production for almost a year now with zero crashes. The crashes in the new setup occur on SMP or Non SMP kernel compiles. DMESG -------------- Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p4 #0: Mon May 7 09:33:50 EDT 2007 root@localhost:/usr/src/sys/amd64/compile/SMP ACPI APIC Table: <HP 00000083> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz (2666.78-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x4e3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,<b9>,CX16,<b14>,<b15>,<b18>> AMD Features=0x20000800<SYSCALL,LM> AMD Features2=0x1<LAHF> Cores per package: 2 real memory = 2145746944 (2046 MB) avail memory = 2060341248 (1964 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <HP P58> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci9: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci9 pci10: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci10 pci11: <ACPI PCI bus> on pcib3 em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x5000-0x501f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 16 at device 0.0 on pci11 em0: Ethernet address: 00:17:08:7e:b6:ac em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x5020-0x503f mem 0xfdfa0000-0xfdfbffff,0xfdf80000-0xfdf9ffff irq 17 at device 0.1 on pci11 em1: Ethernet address: 00:17:08:7e:b6:ad pcib4: <PCI-PCI bridge> at device 1.0 on pci10 pci14: <PCI bus> on pcib4 pcib5: <PCI-PCI bridge> at device 2.0 on pci10 pci15: <PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 0.3 on pci9 pci16: <ACPI PCI bus> on pcib6 pcib7: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci6: <ACPI PCI bus> on pcib7 ciss0: <HP Smart Array P400i> port 0x4000-0x40ff mem 0xfdd00000-0xfddfffff,0xfdcf0000-0xfdcf0fff irq 16 at device 0.0 on pci6 ciss0: [GIANT-LOCKED] pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci19: <ACPI PCI bus> on pcib8 pcib9: <PCI-PCI bridge> at device 5.0 on pci0 pci22: <PCI bus> on pcib9 pcib10: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci2: <ACPI PCI bus> on pcib10 pcib11: <ACPI PCI-PCI bridge> at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib11 bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2), v0.9.6> mem 0xf8000000-0xf9ffffff irq 18 at device 0.0 on pci3 bce0: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz miibus0: <MII bus> on bce0 brgphy0: <BCM5708C 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce0: Ethernet address: 00:1a:4b:df:bb:0e pcib12: <ACPI PCI-PCI bridge> at device 7.0 on pci0 pci4: <ACPI PCI bus> on pcib12 pcib13: <ACPI PCI-PCI bridge> at device 0.0 on pci4 pci5: <ACPI PCI bus> on pcib13 bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2), v0.9.6> mem 0xfa000000-0xfbffffff irq 19 at device 0.0 on pci5 bce1: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz miibus1: <MII bus> on bce1 brgphy1: <BCM5708C 10/100/1000baseTX PHY> on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce1: Ethernet address: 00:1a:4b:df:bb:06 uhci0: <UHCI (generic) USB controller> port 0x1000-0x101f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x1020-0x103f irq 17 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <UHCI (generic) USB controller> port 0x1040-0x105f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <UHCI (generic) USB controller> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <UHCI (generic) USB controller> port 0x1060-0x107f irq 19 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: <UHCI (generic) USB controller> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf7df0000-0xf7df03ff irq 16 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb4: waiting for BIOS to give up control usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: <EHCI (generic) USB 2.0 controller> on ehci0 usb4: USB revision 2.0 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered pcib14: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci1: <ACPI PCI bus> on pcib14 pci1: <display, VGA> at device 3.0 (no driver attached) pci1: <base peripheral> at device 4.0 (no driver attached) pci1: <base peripheral> at device 4.2 (no driver attached) uhci4: <UHCI (generic) USB controller> port 0x3800-0x381f irq 22 at device 4.4 on pci1 uhci4: [GIANT-LOCKED] usb5: <UHCI (generic) USB controller> on uhci4 usb5: USB revision 1.0 uhub5: (0x103c) UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub5: 2 ports with 2 removable, self powered pci1: <serial bus> at device 4.6 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f irq 17 at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 acpi_tz0: <Thermal Zone> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xe6000-0xe7fff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ukbd0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1 kbd2 at ukbd0 ums0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1 ums0: 3 buttons. uhub6: HP Virtual Hub, class 9/0, rev 1.10/0.01, addr 3 uhub6: 7 ports with 7 removable, self powered Timecounters tick every 1.000 msec acd0: CDRW <DW-224E-R/C.AC> at ata0-master UDMA33 SMP: AP CPU #1 Launched! da0 at ciss0 bus 0 target 0 lun 0 da0: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) da1 at ciss0 bus 0 target 1 lun 0 da1: <COMPAQ RAID 0 VOLUME OK> Fixed Direct Access SCSI-5 device da1: 135.168MB/s transfers da1: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /data was not properly dismounted /data: mount pending error: blocks 52932 files 4411 WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted netsmb_dev: loaded ukbd1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1 kbd3 at ukbd1 ums1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1 ums1: 5 buttons and Z dir. bce0: link state changed to DOWN bce0: link state changed to UP bce0: link state changed to DOWN bce0: link state changed to UP ukbd1: at uhub1 port 1 (addr 2) disconnected ukbd1: detached ums1: at uhub1 port 1 (addr 2) disconnected ums1: detached CRASH DUMP ----------------------- This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0xd4 fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff8041dae4 stack pointer = 0x10:0xffffffffb1c73b10 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 9 (thread taskq) trap number = 12 panic: page fault Uptime: 1d14h56m53s Dumping 2045 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 2046MB (523608 pages) 2030 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) list *0xffffffff8041dae4 0xffffffff8041dae4 is in turnstile_setowner (/usr/src/sys/kern/subr_turnstile.c:433). 428 429 mtx_assert(&td_contested_lock, MA_OWNED); 430 MPASS(owner->td_proc->p_magic == P_MAGIC); 431 MPASS(ts->ts_owner == NULL); 432 ts->ts_owner = owner; 433 LIST_INSERT_HEAD(&owner->td_contested, ts, ts_link); 434 } 435 436 /* 437 * Malloc a turnstile for a new thread, initialize it and return it. (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f60d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f66d6 in panic (fmt=0xffffff007b883980 "\b*\213{") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff806106f2 in trap_fatal (frame=0xffffff007b883980, eva=18446742976270641672) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80610c16 in trap (frame= {tf_rdi = -1097864339072, tf_rsi = 4, tf_rdx = -1097439102592, tf_rcx = 3221225730, tf_r8 = -1097439102528, tf_r9 = -1097864339072, tf_rax = 2, tf_rbx = -1097439102592, tf_rbp = 4, tf_r10 = -1097864339072, tf_r11 = -1097439102592, tf_r12 = -1097439102592, tf_r13 = -1097864339072, tf_r14 = -2138051040, tf_r15 = -1098519824376, tf_trapno = 12, tf_addr = 212, tf_flags = -4295930868989109831, tf_err = 0, tf_rip = -2143167772, tf_cs = 8, tf_rflags = 65543, tf_rsp = -1312343256, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff805fe2fb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff8041dae4 in turnstile_setowner (ts=0xffffff00622fa180, owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:432 #8 0xffffffff8041e0eb in turnstile_wait (lock=0xffffff003b1db808, owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:591 #9 0xffffffff803ec139 in _mtx_lock_sleep (m=0xffffff003b1db808, tid=18446742976270449024, opts=2072525184, file=0xc0000102 <Address 0xc0000102 out of bounds>, line=2072525248) at /usr/src/sys/kern/kern_mutex.c:579 #10 0xffffffff80449193 in unp_gc (arg=0xffffff00622fa180, pending=4) at /usr/src/sys/kern/uipc_usrreq.c:1714 #11 0xffffffff8041bfdd in taskqueue_run (queue=0xffffff0000bf8500) at /usr/src/sys/kern/subr_taskqueue.c:257 #12 0xffffffff8041cbc5 in taskqueue_thread_loop (arg=0xffffff00622fa180) at /usr/src/sys/kern/subr_taskqueue.c:376 #13 0xffffffff803dbf03 in fork_exit (callout=0xffffffff8041cb40 <taskqueue_thread_loop>, arg=0xffffffff808fdbb0, frame=0xffffffffb1c73c50) at /usr/src/sys/kern/kern_fork.c:821 #14 0xffffffff805fe65e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 #15 0x0000000000000000 in ?? () #16 0x0000000000000000 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? ()
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2c2c47aa0705070932h65ab47cfv33a583645f9e7a3c>