From owner-freebsd-current@FreeBSD.ORG Tue May 22 00:15:47 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C20DA16A421 for ; Tue, 22 May 2007 00:15:47 +0000 (UTC) (envelope-from kargl@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id 8E89A13C45B for ; Tue, 22 May 2007 00:15:47 +0000 (UTC) (envelope-from kargl@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.1/8.13.8) with ESMTP id l4M0F7tk001015 for ; Mon, 21 May 2007 17:15:07 -0700 (PDT) (envelope-from kargl@troutmask.apl.washington.edu) Received: (from kargl@localhost) by troutmask.apl.washington.edu (8.14.1/8.13.8/Submit) id l4M0F76e001014 for freebsd-current@freebsd.org; Mon, 21 May 2007 17:15:07 -0700 (PDT) (envelope-from kargl) From: "Steven G. Kargl" Message-Id: <200705220015.l4M0F76e001014@troutmask.apl.washington.edu> To: freebsd-current@freebsd.org Date: Mon, 21 May 2007 17:15:07 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL123f (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="US-ASCII" X-Mailman-Approved-At: Tue, 22 May 2007 00:23:18 +0000 Subject: kernel panic in sbflush_internal X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 00:15:47 -0000 One of my colleagues brought down a node on my cluster while running a MPI job. The kernel coredump shows Script started on Mon May 21 17:02:53 2007 node12:root[201] kgdb kernel.debug vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] Unread portion of the kernel message buffer: panic: sbflush_internal: cc 4294965848 || mb 0 || mbcnt 0 cpuid = 0 Uptime: 7h6m34s Physical memory: 16119 MB Dumping 631 MB: 616 600 584 568 552 536 520 504 488 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8 #0 doadump () at pcpu.h:171 171 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:171 #1 0xffffffff802a01eb in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xffffffff802a08cc in panic (fmt=0xffffff03157e0d20 "") at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xffffffff802f4d23 in sbflush_internal (sb=0xffffff031243ab68) at /usr/src/sys/kern/uipc_sockbuf.c:808 #4 0xffffffff802f50cb in sbflush (sb=0xffffff031243ab68) at /usr/src/sys/kern/uipc_sockbuf.c:825 #5 0xffffffff803b7246 in tcp_disconnect (tp=0xffffff03101f73e0) at /usr/src/sys/netinet/tcp_usrreq.c:1496 #6 0xffffffff803b7539 in tcp_usr_disconnect (so=0xffffff0311a04690) at /usr/src/sys/netinet/tcp_usrreq.c:584 #7 0xffffffff802f67f2 in soclose (so=0xffffff031243aae0) at /usr/src/sys/kern/uipc_socket.c:642 #8 0xffffffff802de133 in soo_close (fp=0xffffff0312402258, td=0x0) at /usr/src/sys/kern/sys_socket.c:296 #9 0xffffffff8027479f in fdrop (fp=0xffffff0312402258, td=0xffffff03157e0d20) at file.h:297 #10 0xffffffff80274aaf in closef (fp=0xffffff0312402258, td=0xffffff03157e0d20) at /usr/src/sys/kern/kern_descrip.c:1928 #11 0xffffffff80275f54 in fdfree (td=0xffffff03157e0d20) at /usr/src/sys/kern/kern_descrip.c:1638 #12 0xffffffff8027f537 in exit1 (td=0xffffff03157e0d20, rv=9) at /usr/src/sys/kern/kern_exit.c:271 #13 0xffffffff802a578f in sigexit (td=0xffffff03157e0d20, sig=9) at /usr/src/sys/kern/kern_sig.c:2862 #14 0xffffffff802a63ac in postsig (sig=9) at /usr/src/sys/kern/kern_sig.c:2741 #15 0xffffffff802d3547 in ast (framep=0xffffffffb0580c70) at /usr/src/sys/kern/subr_trap.c:271 #16 0xffffffff804787f0 in Xfast_syscall () ---Type to continue, or q to quit--- at /usr/src/sys/amd64/amd64/exception.S:283 #17 0x00000003c0c7294c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) quit I have the debug kernel and vmcore file, and can make it available. The dmesg for the node that panic is Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-CURRENT #6: Fri May 18 10:19:43 PDT 2007 kargl@node10.cimu.org:/usr/obj/usr/src/sys/HPC ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual Core AMD Opteron(tm) Processor 280 (2391.55-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 Cores per package: 2 usable memory = 16902705152 (16119 MB) avail memory = 16387166208 (15628 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard acpi0: on motherboard acpi0: [ITHREAD] acpi_hpet0: iomem 0xfec01000-0xfec013ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 2000 acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, eff00000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 6.0 on pci0 pci3: on pcib1 ohci0: mem 0xfeafc000-0xfeafcfff irq 19 at device 0.0 on pci3 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 device_attach: uhub0 attach returned 6 usb0: port 0, set config at addr 1 failed usb0: root hub problem, error=4 ohci1: mem 0xfeafd000-0xfeafdfff irq 19 at device 0.1 on pci3 ohci1: [GIANT-LOCKED] ohci1: [ITHREAD] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: on ohci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 3 ports with 3 removable, self powered atapci0: port 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f mem 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] ata4: on atapci0 ata4: [ITHREAD] ata5: on atapci0 ata5: [ITHREAD] vgapci0: port 0xb800-0xb8ff mem 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3 isab0: at device 7.0 on pci0 isa0: on isab0 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 ata0: on atapci1 ata0: [ITHREAD] ata1: on atapci1 ata1: [ITHREAD] amdsmb0: port 0xcc00-0xcc1f irq 19 at device 7.2 on pci0 smbus0: on amdsmb0 smb0: on smbus0 amdpm0: port 0x10e0-0x10ff at device 7.3 on pci0 smbus1: on amdpm0 smb1: on smbus1 pcib2: at device 10.0 on pci0 pci2: on pcib2 pci2:9:0: bad VPD cksum, remain 72 bge0: mem 0xfc8c0000-0xfc8cffff,0xfc8b0000-0xfc8bffff irq 24 at device 9.0 on pci2 miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:e0:81:34:e1:4c bge0: [ITHREAD] pci2:9:1: bad VPD cksum, remain 72 bge1: mem 0xfc8f0000-0xfc8fffff,0xfc8e0000-0xfc8effff irq 25 at device 9.1 on pci2 miibus1: on bge1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:e0:81:34:e1:4d bge1: [ITHREAD] pcib3: at device 11.0 on pci0 pci1: on pcib3 acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 ppc0: port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff,0xcc800-0xcdfff,0xce000-0xcf7ff,0xcf800-0xd07ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <8 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ad4: 239372MB at ata2-master SATA150 SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! hwpmc: TSC/1/0x20 K8/4/0x1ff Trying to mount root from ufs:/dev/ad4s1a WARNING: / was not properly dismounted -- Steve http://troutmask.apl.washington.edu/~kargl/