From owner-freebsd-current@FreeBSD.ORG Mon Aug 15 15:09:48 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ECDD416A41F for ; Mon, 15 Aug 2005 15:09:48 +0000 (GMT) (envelope-from khetan@os.org.za) Received: from gauntlet.os.org.za (gauntlet.os.org.za [196.35.70.242]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5055C43D45 for ; Mon, 15 Aug 2005 15:09:47 +0000 (GMT) (envelope-from khetan@os.org.za) Received: from localhost (localhost [127.0.0.1]) by gauntlet.os.org.za (Postfix) with ESMTP id 8D19D6782A for ; Mon, 15 Aug 2005 17:09:43 +0200 (SAST) Received: from gauntlet.os.org.za ([127.0.0.1]) by localhost (gauntlet.os.org.za [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 78190-03 for ; Mon, 15 Aug 2005 17:09:34 +0200 (SAST) Received: from gauntlet.os.org.za (gauntlet.os.org.za [196.35.70.242]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: khetan) by gauntlet.os.org.za (Postfix) with ESMTP id 71D5C67825 for ; Mon, 15 Aug 2005 17:09:34 +0200 (SAST) Date: Mon, 15 Aug 2005 17:09:34 +0200 (SAST) From: Khetan Gajjar To: current@freebsd.org Message-ID: <20050815170049.B17105@gauntlet.os.org.za> X-Alternate-From: Khetan Gajjar X-Mobile: +27 82 885 4047 X-URL: http://khetan.gajjar.co.za/ X-Attribute-1: BOFH X-Attribute-2: the righteous bastard with a finger on The Switch X-PGP-KeyID: 0x806AD0D9 X-PGP-Fingerprint: 19 29 68 D5 74 2B 6E E5 1B 88 45 3B 29 0B 8A 27 MIME-Version: 1.0 Content-Type: MULTIPART/Mixed; BOUNDARY="----=_NextPart_000_0051_01C5A1B3.831A1630" Content-ID: <20050815170049.S17105@gauntlet.os.org.za> X-Virus-Scanned: amavisd-new at os.org.za Cc: Subject: Panic: snapacct_ufs2: bad block X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2005 15:09:49 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ------=_NextPart_000_0051_01C5A1B3.831A1630 Content-Type: TEXT/PLAIN; CHARSET=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT Content-ID: <20050815170049.H17105@gauntlet.os.org.za> Hi. I'm seeing several snapshot-related crashes in -current, cvsup'd 08/12/2005 at 15:15 GMT+0200. I suspect a ule scheduler/snapshot interaction. /var/crash/info.1 reveals Dump header from device /dev/ad0s1b Architecture: i386 Architecture Version: 33554432 Dump Length: 528023552B (503 MB) Blocksize: 512 Dumptime: Mon Aug 15 12:32:00 2005 Hostname: citadel.os.org.za Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.0-CURRENT #0: Fri Aug 12 22:44:36 SAST 2005 khetan@citadel.os.org.za:/usr/src/sys/i386/compile/CITADEL5 Panic String: snapacct_ufs2: bad block Dump Parity: 1551260746 Bounds: 1 Dump Status: good Kgdb reveals [citadel] /var/crash# kgdb -c vmcore.1 /usr/src/sys/i386/compile/CITADEL5/kernel.debug [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: ÀÍÁ@ ÁÄ Á¢ÁÀÍÁ ÁÁ ¢ÁÀÍÁÀ ÁDÁ0¢ÁÀÍÁÁ Á@¢Á #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) backtrace #1 0xc050212c in boot (howto=260) at ../../../kern/kern_shutdown.c:397 #2 0xc0502481 in panic (fmt=0xc06bb00a "snapacct_ufs2: bad block") at ../../../kern/kern_shutdown.c:553 #3 0xc05f9d95 in snapacct_ufs2 (vp=0xc2720880, oldblkp=0xc2673dd0, lastblkp=0xc2676000, fs=0xc1a75800, lblkno=12, expungetype=2) at ../../../ufs/ffs/ffs_snapshot.c:1338 #4 0xc05f9b3b in indiracct_ufs2 (snapvp=0xc2720880, cancelvp=0xc1ca9990, level=0, blkno=Unhandled dwarf expression opcode 0x93 ) at ../../../ufs/ffs/ffs_snapshot.c:1253 #5 0xc05f9905 in expunge_ufs2 (snapvp=0xc2720880, cancelip=0xc1c58bdc, fs=0xc1a75800, acctfunc=0xc05f9c7c , expungetype=2) at ../../../ufs/ffs/ffs_snapshot.c:1185 #6 0xc05f7eaa in ffs_snapshot (mp=0xc1c05c00, snapfile=0xc1c58ce4 "`\214ÅÁ") at ../../../ufs/ffs/ffs_snapshot.c:605 #7 0xc0605de1 in ffs_mount (mp=0xc1c05c00, td=0xc24bb000) at ../../../ufs/ffs/ffs_vfsops.c:302 #8 0xc05556fc in vfs_domount (td=0xc24bb000, fstype=0xc1cb01f0 "ufs", fspath=0xc1cb0a00 "/", fsflags=16842752, fsdata=0xc2f23710) at ../../../kern/vfs_mount.c:739 #9 0xc0554ee9 in vfs_donmount (td=0xc24bb000, fsflags=16842752, fsoptions=0xd7041c04) at ../../../kern/vfs_mount.c:503 #10 0xc0557444 in kernel_mount (ma=0xc2311330, flags=16842752) at pcpu.h:162 #11 0xc0606041 in ffs_cmount (ma=0xc2311330, data=0x0, flags=16842752, ---Type to continue, or q to quit--- td=0xc24bb000) at ../../../ufs/ffs/ffs_vfsops.c:384 #12 0xc05550c6 in mount (td=0xc24bb000, uap=0xd7041d04) at ../../../kern/vfs_mount.c:566 #13 0xc066f0db in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134523985, tf_esi = -1077941244, tf_ebp = -1077943848, tf_isp = -687596188, tf_ebx = -1077943792, tf_edx = -1, tf_ecx = -1077940433, tf_eax = 21, tf_trapno = 12, tf_err = 2, tf_eip = 671848243, tf_cs = 51, tf_eflags = 582, tf_esp = -1077944004, tf_ss = 59}) at ../../../i386/i386/trap.c:986 #14 0xc065bb0f in Xint0x80_syscall () at ../../../i386/i386/exception.s:200 #15 0x0000003b in ?? () #16 0x0000003b in ?? () #17 0x0000003b in ?? () #18 0x0804ac51 in ?? () #19 0xbfbfec04 in ?? () #20 0xbfbfe1d8 in ?? () #21 0xd7041d64 in ?? () #22 0xbfbfe210 in ?? () #23 0xffffffff in ?? () #24 0xbfbfef2f in ?? () #25 0x00000015 in ?? () #26 0x0000000c in ?? () #27 0x00000002 in ?? () ---Type to continue, or q to quit--- #26 0x0000000c in ?? () #27 0x00000002 in ?? () ---Type to continue, or q to quit--- #28 0x280b9733 in ?? () #29 0x00000033 in ?? () #30 0x00000246 in ?? () #31 0xbfbfe13c in ?? () #32 0x0000003b in ?? () #33 0x00000000 in ?? () #34 0x00000000 in ?? () #35 0x00000000 in ?? () #36 0x00000000 in ?? () #37 0x12471000 in ?? () #38 0xc24bb154 in ?? () #39 0xc19b27d0 in ?? () #40 0xd7041504 in ?? () #41 0xd70414e8 in ?? () #42 0xc24bb000 in ?? () #43 0xc0514827 in sched_switch (td=0xbfbfe210, newtd=0xbfbfec04, flags=Cannot access memory at address 0xbfbfe1e8 ) at ../../../kern/sched_ule.c:1387 Previous frame inner to this frame (corrupt stack?) This points to a ULE scheduler issue, right ? My dmesg shows Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 7.0-CURRENT #0: Fri Aug 12 22:44:36 SAST 2005 khetan@citadel.os.org.za:/usr/src/sys/i386/compile/CITADEL5 WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant WARNING: MPSAFE network stack disabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(R) CPU 2.00GHz (1999.95-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Features2=0x4400> real memory = 528416768 (503 MB) avail memory = 507617280 (484 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) pci_link0: on acpi0 pci_link1: on acpi0 pci_link2: irq 11 on acpi0 pci_link3: on acpi0 pci_link4: irq 0 on acpi0 pci_link5: irq 0 on acpi0 pci_link6: irq 0 on acpi0 pci_link7: irq 0 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xeb000000-0xeb7fffff a t device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) fxp0: port 0xd000-0xd03f mem 0xeb820000-0xeb820ff f,0xeb800000-0xeb81ffff irq 18 at device 8.0 on pci0 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:02:b3:ed:ec:a2 fxp0: [GIANT-LOCKED] isab0: at device 17.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376, 0xe000-0xe00f at device 17.1 on pci0 ata0: on atapci0 ata1: on atapci0 acpi_tz0: on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: at iomem 0xcc000-0xcd7ff pnpid ORM0000 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1999954984 Hz quality 800 Timecounters tick every 1.000 msec IPsec: Initialized Security Association Processing. ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, defa ult to deny, logging unlimited ad0: 39266MB at ata0-master UDMA100 ad2: 39266MB at ata1-master UDMA100 Trying to mount root from ufs:/dev/ad0s1a fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 fxp0: Microcode loaded, int_delay: 1000 usec bundle_max: 6 Accounting enabled I'd appreciate any pointers! Thanks. PS Problem is the machine is hosted in a remote data centre, requiring manual intervention to re-fsck it every time this crash occurs. For now, I'd disabled snapshots and forced fsck_y_enable="YES" background_fsck="NO" in /etc/rc.conf in the vain hope that if the machine barfs, it'll pick itself up again. That is logical, yes ? Khetan Gajjar -- Services | +27 11 575 3832 Internet Solutions | http://www.is.co.za/ ------=_NextPart_000_0051_01C5A1B3.831A1630--