From owner-freebsd-stable@FreeBSD.ORG Mon May 7 16:59:48 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E749416A401 for ; Mon, 7 May 2007 16:59:48 +0000 (UTC) (envelope-from dungeons@gmail.com) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.233]) by mx1.freebsd.org (Postfix) with ESMTP id 654CB13C483 for ; Mon, 7 May 2007 16:59:48 +0000 (UTC) (envelope-from dungeons@gmail.com) Received: by nz-out-0506.google.com with SMTP id s1so1718682nze for ; Mon, 07 May 2007 09:59:47 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=o3/nqX86pbmOOaC3cpjIzinhnYikCOFLqId18N9U930BIIpfrOmITJjtdLubPhP3La1ls3IFSUDqH3VzUGmFjF1B3aANj/viTPts1aEYbRwywZ0+aVitnyCJzey69SEDfn59l2cOhINwj+mS+y48VKmuEve5I8I4u/vV/57oANQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=rnB2cO9IbXV7WNca+GtgTVNSXOyfGIVV3JlTSjy9kVA9KPEC9uSj/RB+aJFWmop1qx44UbjsTsNBIKJi61p9jXs0QDs4wflQfZWQdtaF2ljUr0MuKHQbE9AxCVv6WfZQnt9X4Pmr1htnulqTOAYxTKuo1cvyCWaQH1nx34to6XQ= Received: by 10.114.78.1 with SMTP id a1mr2204077wab.1178555558276; Mon, 07 May 2007 09:32:38 -0700 (PDT) Received: by 10.114.146.19 with HTTP; Mon, 7 May 2007 09:32:38 -0700 (PDT) Message-ID: <2c2c47aa0705070932h65ab47cfv33a583645f9e7a3c@mail.gmail.com> Date: Mon, 7 May 2007 12:32:38 -0400 From: "Pat Wendorf" To: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: HP DL-360 Kernel Crashes on 6.2-R X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2007 16:59:49 -0000 Hey all, I'm running 6.2R-p4 and I'm having some terrible problems with stability on HP-DL360 G4 and G5 hardware. We use these systems mostly for high volume postfix mail servers, and under heavy postfix queue load, the server will crash (sometimes within minutes, sometimes 12-14 hours after being put in load). It also seems to also be met with some degree of file system corruption when the box comes backup. The first challenge I have is, I don't have local access to the box. These boxes are hosted with a managed hosting provider who does not understand FreeBSD at all (yikes). The second problem is, the type of crash seems to very rarely produce a crash dump under /var/crash. I'll provide the one crash dump I've managed to get below. To make things worse, this crash appears on 2 types of hardware (G4 and G5 boxes) and 3 types of raid controllers. We're currently deprecating the G4 hardware, so I'll just send the dmesg and crash dump from the G5, but I can assure you the symptoms are exactly the same on both types of boxes. I'd also mention we've been using the exact same G4 hardware with 6.1-p11 in production for almost a year now with zero crashes. The crashes in the new setup occur on SMP or Non SMP kernel compiles. DMESG -------------- Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE-p4 #0: Mon May 7 09:33:50 EDT 2007 root@localhost:/usr/src/sys/amd64/compile/SMP ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz (2666.78-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff Features2=0x4e3bd,CX16,,,> AMD Features=0x20000800 AMD Features2=0x1 Cores per package: 2 real memory = 2145746944 (2046 MB) avail memory = 2060341248 (1964 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: on acpi0 cpu1: on acpi0 pcib0: on acpi0 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci9: on pcib1 pcib2: at device 0.0 on pci9 pci10: on pcib2 pcib3: at device 0.0 on pci10 pci11: on pcib3 em0: port 0x5000-0x501f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 16 at device 0.0 on pci11 em0: Ethernet address: 00:17:08:7e:b6:ac em1: port 0x5020-0x503f mem 0xfdfa0000-0xfdfbffff,0xfdf80000-0xfdf9ffff irq 17 at device 0.1 on pci11 em1: Ethernet address: 00:17:08:7e:b6:ad pcib4: at device 1.0 on pci10 pci14: on pcib4 pcib5: at device 2.0 on pci10 pci15: on pcib5 pcib6: at device 0.3 on pci9 pci16: on pcib6 pcib7: at device 3.0 on pci0 pci6: on pcib7 ciss0: port 0x4000-0x40ff mem 0xfdd00000-0xfddfffff,0xfdcf0000-0xfdcf0fff irq 16 at device 0.0 on pci6 ciss0: [GIANT-LOCKED] pcib8: at device 4.0 on pci0 pci19: on pcib8 pcib9: at device 5.0 on pci0 pci22: on pcib9 pcib10: at device 6.0 on pci0 pci2: on pcib10 pcib11: at device 0.0 on pci2 pci3: on pcib11 bce0: mem 0xf8000000-0xf9ffffff irq 18 at device 0.0 on pci3 bce0: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz miibus0: on bce0 brgphy0: on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce0: Ethernet address: 00:1a:4b:df:bb:0e pcib12: at device 7.0 on pci0 pci4: on pcib12 pcib13: at device 0.0 on pci4 pci5: on pcib13 bce1: mem 0xfa000000-0xfbffffff irq 19 at device 0.0 on pci5 bce1: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz miibus1: on bce1 brgphy1: on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bce1: Ethernet address: 00:1a:4b:df:bb:06 uhci0: port 0x1000-0x101f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x1020-0x103f irq 17 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0x1040-0x105f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0x1060-0x107f irq 19 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: mem 0xf7df0000-0xf7df03ff irq 16 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb4: waiting for BIOS to give up control usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: on ehci0 usb4: USB revision 2.0 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered pcib14: at device 30.0 on pci0 pci1: on pcib14 pci1: at device 3.0 (no driver attached) pci1: at device 4.0 (no driver attached) pci1: at device 4.2 (no driver attached) uhci4: port 0x3800-0x381f irq 22 at device 4.4 on pci1 uhci4: [GIANT-LOCKED] usb5: on uhci4 usb5: USB revision 1.0 uhub5: (0x103c) UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub5: 2 ports with 2 removable, self powered pci1: at device 4.6 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f irq 17 at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 acpi_tz0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: at iomem 0xc0000-0xcafff,0xe6000-0xe7fff on isa0 ppc0: cannot reserve I/O port range sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ukbd0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1 kbd2 at ukbd0 ums0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1 ums0: 3 buttons. uhub6: HP Virtual Hub, class 9/0, rev 1.10/0.01, addr 3 uhub6: 7 ports with 7 removable, self powered Timecounters tick every 1.000 msec acd0: CDRW at ata0-master UDMA33 SMP: AP CPU #1 Launched! da0 at ciss0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) da1 at ciss0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 135.168MB/s transfers da1: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /data was not properly dismounted /data: mount pending error: blocks 52932 files 4411 WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted netsmb_dev: loaded ukbd1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1 kbd3 at ukbd1 ums1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1 ums1: 5 buttons and Z dir. bce0: link state changed to DOWN bce0: link state changed to UP bce0: link state changed to DOWN bce0: link state changed to UP ukbd1: at uhub1 port 1 (addr 2) disconnected ukbd1: detached ums1: at uhub1 port 1 (addr 2) disconnected ums1: detached CRASH DUMP ----------------------- This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0xd4 fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff8041dae4 stack pointer = 0x10:0xffffffffb1c73b10 frame pointer = 0x10:0x4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 9 (thread taskq) trap number = 12 panic: page fault Uptime: 1d14h56m53s Dumping 2045 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 2046MB (523608 pages) 2030 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:172 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) list *0xffffffff8041dae4 0xffffffff8041dae4 is in turnstile_setowner (/usr/src/sys/kern/subr_turnstile.c:433). 428 429 mtx_assert(&td_contested_lock, MA_OWNED); 430 MPASS(owner->td_proc->p_magic == P_MAGIC); 431 MPASS(ts->ts_owner == NULL); 432 ts->ts_owner = owner; 433 LIST_INSERT_HEAD(&owner->td_contested, ts, ts_link); 434 } 435 436 /* 437 * Malloc a turnstile for a new thread, initialize it and return it. (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803f60d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff803f66d6 in panic (fmt=0xffffff007b883980 "\b*\213{") at /usr/src/sys/kern/kern_shutdown.c:565 #4 0xffffffff806106f2 in trap_fatal (frame=0xffffff007b883980, eva=18446742976270641672) at /usr/src/sys/amd64/amd64/trap.c:660 #5 0xffffffff80610c16 in trap (frame= {tf_rdi = -1097864339072, tf_rsi = 4, tf_rdx = -1097439102592, tf_rcx = 3221225730, tf_r8 = -1097439102528, tf_r9 = -1097864339072, tf_rax = 2, tf_rbx = -1097439102592, tf_rbp = 4, tf_r10 = -1097864339072, tf_r11 = -1097439102592, tf_r12 = -1097439102592, tf_r13 = -1097864339072, tf_r14 = -2138051040, tf_r15 = -1098519824376, tf_trapno = 12, tf_addr = 212, tf_flags = -4295930868989109831, tf_err = 0, tf_rip = -2143167772, tf_cs = 8, tf_rflags = 65543, tf_rsp = -1312343256, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238 #6 0xffffffff805fe2fb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0xffffffff8041dae4 in turnstile_setowner (ts=0xffffff00622fa180, owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:432 #8 0xffffffff8041e0eb in turnstile_wait (lock=0xffffff003b1db808, owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:591 #9 0xffffffff803ec139 in _mtx_lock_sleep (m=0xffffff003b1db808, tid=18446742976270449024, opts=2072525184, file=0xc0000102
, line=2072525248) at /usr/src/sys/kern/kern_mutex.c:579 #10 0xffffffff80449193 in unp_gc (arg=0xffffff00622fa180, pending=4) at /usr/src/sys/kern/uipc_usrreq.c:1714 #11 0xffffffff8041bfdd in taskqueue_run (queue=0xffffff0000bf8500) at /usr/src/sys/kern/subr_taskqueue.c:257 #12 0xffffffff8041cbc5 in taskqueue_thread_loop (arg=0xffffff00622fa180) at /usr/src/sys/kern/subr_taskqueue.c:376 #13 0xffffffff803dbf03 in fork_exit (callout=0xffffffff8041cb40 , arg=0xffffffff808fdbb0, frame=0xffffffffb1c73c50) at /usr/src/sys/kern/kern_fork.c:821 #14 0xffffffff805fe65e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 #15 0x0000000000000000 in ?? () #16 0x0000000000000000 in ?? () #17 0x0000000000000001 in ?? () #18 0x0000000000000000 in ?? ()