Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Jul 2002 14:32:25 -0700 (PDT)
From:      John Engelhart <johne@zang.com>
To:        hackers@freebsd.org
Subject:   Serious File System problems (4.6-Release)
Message-ID:  <Pine.GSO.4.21.0207241412190.25631-100000@zang.com>

next in thread | raw e-mail | index | archive | help
Hello,

I've been having some very serious stability problems with FreeBSD, mostly
it seems with the file system.  I've had these problems since 4.3, but now
I have a free machine that isn't doing anything critical to really pound
on the issue.

twister# uname -a
FreeBSD twister.zang.com 4.6-RELEASE FreeBSD 4.6-RELEASE #6: Wed Jul 24
10:36:30 PDT 2002     johne@twister.zang.com:/usr/src/sys/compile/twister
i386

These are the differences from a GENERIC kernel config:

< #cpu          I386_CPU
< #cpu          I486_CPU
< #cpu          I586_CPU
< ident         twister
< maxusers      128
< makeoptions     DEBUG=-g                #Build kernel with gdb(1) debug symbols
< options         DDB, DDB_UNATTENDED
< options       SMP                     # Symmetric MultiProcessor Kernel
< options       APIC_IO                 # Symmetric (APIC) I/O

The box itself is a Tyan Tiger S2466.  It's previous incarnation was a
Tyan Tiger S2460.  CPU's are AMD 1800 MP's, and it uses a gig of ECC
RAM.  The BIOS is set up for ECC Scrub.  It's currently stripped down to
only a brand new generic nvidia gforce2mx AGP card, an IDE hard drive, an
IDE CDROM, and the on board 3com ethernet. In it's previous life, it had a
pretty beefy Adaptec 3210S raid controller and some gig-e ethernet cards
in it.

So today I really started to push it, to start to capture some
data.  Here's what I've got so far.

I'm running crashme (/usr/ports/sysutils/crashme) and postmark
(/usr/ports/benchmarks/postmark).

In previous systems, before all the kernel options were tuned just right
to capture the core dumps and what not, it would randomly crash.  Often
with spectacular file system corruption.  A few times so bad that fsck was
unable to handle it without going to alternate superblocks, and the file
system that came up from the ashes wasn't worth squat (THOUSANDS of files
gone), so it had to be restored from tape.

In this particular incarnation, it hasn't been that bad.  But it hasn't
yet been pushed that hard.  I'll typically run crash me with:

[johne@twister] ~> crashme +2000 666 100 24:00:00&

I'll do this twice, once each in a seperate TTY, to excercise both CPU's.

So farl, based on less than a days worth of testing, crashme alone isn't
enough to trip it up.  It sets the stage, and will eventually cause it to
panic, but does so slowly.  Panic backtrace #1 is from this.

Adding postmark to the mix causes the whole thing to crumble in about 30
minutes.  See panic backtrace #2.

Thoughts?  Is it memory?  Is it CPU?  One of the CPU's is brand new.  One
of the CPU's is left over from one of the original systems.  I've just
purchased another CPU to rule that out.  I've also picked up a stick of
128 megs of ram, one with and one without ECC, to see if that's causing
the problem.  Or am I on to some insidous SMP bug?

Panic #1

(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc01ed803 in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2  0xc01edc75 in panic (
    fmt=0xc03c2200 "vm_fault: fault on nofault entry, addr: %lx")
    at ../../kern/kern_shutdown.c:595
#3  0xc02ff46c in vm_fault (map=0xc044d5ac, vaddr=3535327232, 
    fault_type=3 '\003', fault_flags=0) at ../../vm/vm_fault.c:240
#4  0xc03610d2 in trap_pfault (frame=0xdd9dac18, usermode=0,
eva=3535327232)
    at ../../i386/i386/trap.c:848
#5  0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
      tf_edi = -759640064, tf_esi = -1124073472, tf_ebp = -576869244, 
      tf_isp = -576869308, tf_ebx = 8192, tf_edx = -1124065280, tf_ecx =
2048, 
      tf_eax = -576880640, tf_trapno = 12, tf_err = 2, tf_eip =
-1070202954, 
      tf_cs = 8, tf_eflags = 328214, tf_esp = -576869068, tf_ss =
-576869096})
    at ../../i386/i386/trap.c:458
#6  0xc035ffb6 in generic_copyin ()
#7  0xc02f4a21 in ffs_write (ap=0xdd9dad20)
    at ../../ufs/ufs/ufs_readwrite.c:519
#8  0xc02225db in vn_rdwr (rw=UIO_WRITE, vp=0xddbb3480,
base=0xbd000000cannot read proc at 0
)
    at vnode_if.h:363
#9  0xc0222699 in vn_rdwr_inchunks (rw=UIO_WRITE, vp=0xddbb3480, 
    base=0xbd000000cannot read proc at 0
) at ../../kern/vfs_vnops.c:346
#10 0xc01dbed4 in elf_coredump (p=0xda0d8d40, vp=0xddbb3480, 
    limit=9223372036854775807) at ../../kern/imgact_elf.c:782
#11 0xc01efcb8 in coredump (p=0xda0d8d40) at ../../kern/kern_sig.c:1660
#12 0xc01ef72e in sigexit (p=0xda0d8d40, sig=4) at
../../kern/kern_sig.c:1491
#13 0xc01ef50c in postsig (sig=4) at ../../kern/kern_sig.c:1404
#14 0xc0360ee1 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
      tf_edi = -1077937204, tf_esi = -1077937236, tf_ebp = -1077937448, 
      tf_isp = -576868396, tf_ebx = 134555648, tf_edx = -1077937480, 
      tf_ecx = 1, tf_eax = -844457978, tf_trapno = 27, tf_err = 0, 
      tf_eip = 134555655, tf_cs = 31, tf_eflags = 66178, tf_esp =
-943381828, 
      tf_ss = 47}) at ../../i386/i386/trap.c:174
#15 0x8052807 in ?? ()
cannot read proc at 0
(kgdb) 


--------
Panic #2
--------


(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc01ed803 in boot (howto=260) at ../../kern/kern_shutdown.c:316
#2  0xc01edc75 in panic (fmt=0xc03d4479 "%s") at
../../kern/kern_shutdown.c:595
#3  0xc03614b5 in trap_fatal (frame=0xdb1dab30, eva=1590445512)
    at ../../i386/i386/trap.c:966
#4  0xc0361121 in trap_pfault (frame=0xdb1dab30, usermode=0,
eva=1590445512)
    at ../../i386/i386/trap.c:859
#5  0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
      tf_edi = -1069160724, tf_esi = 16, tf_ebp = -618812552, 
      tf_isp = -618812580, tf_ebx = -557049344, tf_edx = -557049236, 
      tf_ecx = 1, tf_eax = 1590445472, tf_trapno = 12, tf_err = 0, 
      tf_eip = -1071532194, tf_cs = 8, tf_eflags = 2425350, tf_esp = 0, 
      tf_ss = -557049344}) at ../../i386/i386/trap.c:458
#6  0xc021b75e in vget (vp=0xdecc1a00, flags=18, p=0xc045d7e0)
    at ../../kern/vfs_subr.c:1538
#7  0xc02f3b40 in ffs_sync (mp=0xc29bfc00, waitfor=2, cred=0xc2056900, 
    p=0xc045d7e0) at ../../ufs/ffs/ffs_vfsops.c:1000
#8  0xc021db8b in sync (p=0xc045d7e0, uap=0x0) at
../../kern/vfs_syscalls.c:576
#9  0xc01ed59e in boot (howto=256) at ../../kern/kern_shutdown.c:235
#10 0xc01edc75 in panic (fmt=0xc03d4479 "%s") at
../../kern/kern_shutdown.c:595
#11 0xc03614b5 in trap_fatal (frame=0xdb1dacc8, eva=1590445530)
    at ../../i386/i386/trap.c:966
#12 0xc0361121 in trap_pfault (frame=0xdb1dacc8, usermode=0,
eva=1590445530)
    at ../../i386/i386/trap.c:859
#13 0xc0360c7b in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
      tf_edi = -636635264, tf_esi = -1004814848, tf_ebp = -618812100, 
      tf_isp = -618812172, tf_ebx = 0, tf_edx = -618812124, 
      tf_ecx = -557049344, tf_eax = 1590445472, tf_trapno = 12, tf_err =
0, 
      tf_eip = -1071535647, tf_cs = 8, tf_eflags = 2425414, tf_esp = 0, 
      tf_ss = -1004814848}) at ../../i386/i386/trap.c:458
#14 0xc021a9e1 in vinvalbuf (vp=0xdecc1a00, flags=0, cred=0x0,
p=0xda0db780, 
    slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:866
#15 0xc02eb9ae in ffs_truncate (vp=0xdecc1a00, length=0, flags=0,
cred=0x0, 
    p=0xda0db780) at ../../ufs/ffs/ffs_inode.c:199
#16 0xc02f69b4 in ufs_inactive (ap=0xdb1daed8) at
../../ufs/ufs/ufs_inode.c:89
#17 0xc02fbf91 in ufs_vnoperate (ap=0xdb1daed8)
    at ../../ufs/ufs/ufs_vnops.c:2422
#18 0xc021b8ff in vput (vp=0xdecc1a00) at vnode_if.h:815
#19 0xc02ef580 in handle_workitem_remove (dirrem=0xc462f7c0)
    at ../../ufs/ffs/ffs_softdep.c:2852
#20 0xc02ecbf1 in process_worklist_item (matchmnt=0x0, flags=0)
    at ../../ufs/ffs/ffs_softdep.c:716
#21 0xc02eca9a in softdep_process_worklist (matchmnt=0x0)
    at ../../ufs/ffs/ffs_softdep.c:622
#22 0xc021b19b in sched_sync () at ../../kern/vfs_subr.c:1177



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.21.0207241412190.25631-100000>