From owner-freebsd-hackers Wed Jul 24 14:32:36 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F3D0D37B400 for ; Wed, 24 Jul 2002 14:32:26 -0700 (PDT) Received: from zang.com (zang.com [216.34.130.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 93A8943E3B for ; Wed, 24 Jul 2002 14:32:26 -0700 (PDT) (envelope-from johne@zang.com) Received: by zang.com (Postfix, from userid 100) id 2FEA04EBE0; Wed, 24 Jul 2002 14:32:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zang.com (Postfix) with ESMTP id 233963B6D7 for ; Wed, 24 Jul 2002 14:32:26 -0700 (PDT) Date: Wed, 24 Jul 2002 14:32:25 -0700 (PDT) From: John Engelhart To: hackers@freebsd.org Subject: Serious File System problems (4.6-Release) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hello, I've been having some very serious stability problems with FreeBSD, mostly it seems with the file system. I've had these problems since 4.3, but now I have a free machine that isn't doing anything critical to really pound on the issue. twister# uname -a FreeBSD twister.zang.com 4.6-RELEASE FreeBSD 4.6-RELEASE #6: Wed Jul 24 10:36:30 PDT 2002 johne@twister.zang.com:/usr/src/sys/compile/twister i386 These are the differences from a GENERIC kernel config: < #cpu I386_CPU < #cpu I486_CPU < #cpu I586_CPU < ident twister < maxusers 128 < makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols < options DDB, DDB_UNATTENDED < options SMP # Symmetric MultiProcessor Kernel < options APIC_IO # Symmetric (APIC) I/O The box itself is a Tyan Tiger S2466. It's previous incarnation was a Tyan Tiger S2460. CPU's are AMD 1800 MP's, and it uses a gig of ECC RAM. The BIOS is set up for ECC Scrub. It's currently stripped down to only a brand new generic nvidia gforce2mx AGP card, an IDE hard drive, an IDE CDROM, and the on board 3com ethernet. In it's previous life, it had a pretty beefy Adaptec 3210S raid controller and some gig-e ethernet cards in it. So today I really started to push it, to start to capture some data. Here's what I've got so far. I'm running crashme (/usr/ports/sysutils/crashme) and postmark (/usr/ports/benchmarks/postmark). In previous systems, before all the kernel options were tuned just right to capture the core dumps and what not, it would randomly crash. Often with spectacular file system corruption. A few times so bad that fsck was unable to handle it without going to alternate superblocks, and the file system that came up from the ashes wasn't worth squat (THOUSANDS of files gone), so it had to be restored from tape. In this particular incarnation, it hasn't been that bad. But it hasn't yet been pushed that hard. I'll typically run crash me with: [johne@twister] ~> crashme +2000 666 100 24:00:00& I'll do this twice, once each in a seperate TTY, to excercise both CPU's. So farl, based on less than a days worth of testing, crashme alone isn't enough to trip it up. It sets the stage, and will eventually cause it to panic, but does so slowly. Panic backtrace #1 is from this. Adding postmark to the mix causes the whole thing to crumble in about 30 minutes. See panic backtrace #2. Thoughts? Is it memory? Is it CPU? One of the CPU's is brand new. One of the CPU's is left over from one of the original systems. I've just purchased another CPU to rule that out. I've also picked up a stick of 128 megs of ram, one with and one without ECC, to see if that's causing the problem. Or am I on to some insidous SMP bug? Panic #1 (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc01ed803 in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc01edc75 in panic ( fmt=0xc03c2200 "vm_fault: fault on nofault entry, addr: %lx") at ../../kern/kern_shutdown.c:595 #3 0xc02ff46c in vm_fault (map=0xc044d5ac, vaddr=3535327232, fault_type=3 '\003', fault_flags=0) at ../../vm/vm_fault.c:240 #4 0xc03610d2 in trap_pfault (frame=0xdd9dac18, usermode=0, eva=3535327232) at ../../i386/i386/trap.c:848 #5 0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -759640064, tf_esi = -1124073472, tf_ebp = -576869244, tf_isp = -576869308, tf_ebx = 8192, tf_edx = -1124065280, tf_ecx = 2048, tf_eax = -576880640, tf_trapno = 12, tf_err = 2, tf_eip = -1070202954, tf_cs = 8, tf_eflags = 328214, tf_esp = -576869068, tf_ss = -576869096}) at ../../i386/i386/trap.c:458 #6 0xc035ffb6 in generic_copyin () #7 0xc02f4a21 in ffs_write (ap=0xdd9dad20) at ../../ufs/ufs/ufs_readwrite.c:519 #8 0xc02225db in vn_rdwr (rw=UIO_WRITE, vp=0xddbb3480, base=0xbd000000cannot read proc at 0 ) at vnode_if.h:363 #9 0xc0222699 in vn_rdwr_inchunks (rw=UIO_WRITE, vp=0xddbb3480, base=0xbd000000cannot read proc at 0 ) at ../../kern/vfs_vnops.c:346 #10 0xc01dbed4 in elf_coredump (p=0xda0d8d40, vp=0xddbb3480, limit=9223372036854775807) at ../../kern/imgact_elf.c:782 #11 0xc01efcb8 in coredump (p=0xda0d8d40) at ../../kern/kern_sig.c:1660 #12 0xc01ef72e in sigexit (p=0xda0d8d40, sig=4) at ../../kern/kern_sig.c:1491 #13 0xc01ef50c in postsig (sig=4) at ../../kern/kern_sig.c:1404 #14 0xc0360ee1 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077937204, tf_esi = -1077937236, tf_ebp = -1077937448, tf_isp = -576868396, tf_ebx = 134555648, tf_edx = -1077937480, tf_ecx = 1, tf_eax = -844457978, tf_trapno = 27, tf_err = 0, tf_eip = 134555655, tf_cs = 31, tf_eflags = 66178, tf_esp = -943381828, tf_ss = 47}) at ../../i386/i386/trap.c:174 #15 0x8052807 in ?? () cannot read proc at 0 (kgdb) -------- Panic #2 -------- (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc01ed803 in boot (howto=260) at ../../kern/kern_shutdown.c:316 #2 0xc01edc75 in panic (fmt=0xc03d4479 "%s") at ../../kern/kern_shutdown.c:595 #3 0xc03614b5 in trap_fatal (frame=0xdb1dab30, eva=1590445512) at ../../i386/i386/trap.c:966 #4 0xc0361121 in trap_pfault (frame=0xdb1dab30, usermode=0, eva=1590445512) at ../../i386/i386/trap.c:859 #5 0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -1069160724, tf_esi = 16, tf_ebp = -618812552, tf_isp = -618812580, tf_ebx = -557049344, tf_edx = -557049236, tf_ecx = 1, tf_eax = 1590445472, tf_trapno = 12, tf_err = 0, tf_eip = -1071532194, tf_cs = 8, tf_eflags = 2425350, tf_esp = 0, tf_ss = -557049344}) at ../../i386/i386/trap.c:458 #6 0xc021b75e in vget (vp=0xdecc1a00, flags=18, p=0xc045d7e0) at ../../kern/vfs_subr.c:1538 #7 0xc02f3b40 in ffs_sync (mp=0xc29bfc00, waitfor=2, cred=0xc2056900, p=0xc045d7e0) at ../../ufs/ffs/ffs_vfsops.c:1000 #8 0xc021db8b in sync (p=0xc045d7e0, uap=0x0) at ../../kern/vfs_syscalls.c:576 #9 0xc01ed59e in boot (howto=256) at ../../kern/kern_shutdown.c:235 #10 0xc01edc75 in panic (fmt=0xc03d4479 "%s") at ../../kern/kern_shutdown.c:595 #11 0xc03614b5 in trap_fatal (frame=0xdb1dacc8, eva=1590445530) at ../../i386/i386/trap.c:966 #12 0xc0361121 in trap_pfault (frame=0xdb1dacc8, usermode=0, eva=1590445530) at ../../i386/i386/trap.c:859 #13 0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -636635264, tf_esi = -1004814848, tf_ebp = -618812100, tf_isp = -618812172, tf_ebx = 0, tf_edx = -618812124, tf_ecx = -557049344, tf_eax = 1590445472, tf_trapno = 12, tf_err = 0, tf_eip = -1071535647, tf_cs = 8, tf_eflags = 2425414, tf_esp = 0, tf_ss = -1004814848}) at ../../i386/i386/trap.c:458 #14 0xc021a9e1 in vinvalbuf (vp=0xdecc1a00, flags=0, cred=0x0, p=0xda0db780, slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:866 #15 0xc02eb9ae in ffs_truncate (vp=0xdecc1a00, length=0, flags=0, cred=0x0, p=0xda0db780) at ../../ufs/ffs/ffs_inode.c:199 #16 0xc02f69b4 in ufs_inactive (ap=0xdb1daed8) at ../../ufs/ufs/ufs_inode.c:89 #17 0xc02fbf91 in ufs_vnoperate (ap=0xdb1daed8) at ../../ufs/ufs/ufs_vnops.c:2422 #18 0xc021b8ff in vput (vp=0xdecc1a00) at vnode_if.h:815 #19 0xc02ef580 in handle_workitem_remove (dirrem=0xc462f7c0) at ../../ufs/ffs/ffs_softdep.c:2852 #20 0xc02ecbf1 in process_worklist_item (matchmnt=0x0, flags=0) at ../../ufs/ffs/ffs_softdep.c:716 #21 0xc02eca9a in softdep_process_worklist (matchmnt=0x0) at ../../ufs/ffs/ffs_softdep.c:622 #22 0xc021b19b in sched_sync () at ../../kern/vfs_subr.c:1177 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message