Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Sep 2006 10:02:22 -0500
From:      Eric Anderson <anderson@centtech.com>
To:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   6-STABLE filesystem related panics/locks (kgdb output)
Message-ID:  <450EB4FE.2090203@centtech.com>

next in thread | raw e-mail | index | archive | help
Hi all,

On one of our NFS servers, we've seen repeated filesystem issues with 
two of the filesystems (it has 4 exported via NFS).  It usually 
manifests itself by a hung 'df -lk' (wedged in 'ufs'), and mountd 
becomes wedged also, not allowing new mounts, and unable to be killed. 
 From an NFS client, one can continue using the filesystem just fine, 
without an issue.  From the server itself, you can cd to the 
filesystem's root directory, but an ls will hang.  Running a background 
fsck on that filesystem while in this state also blocks on ufs.  My nfsd 
processes with also get stuck in the 'D' state (in 'ufs'), but they 
still appear to be serving data. About a month ago, I brought the system 
down, did a full fsck on all the filesystems, and brought it back up. 
It survived for several weeks (2-3), but is now doing the same thing, so 
I'm uncertain if the issue was affected by the fsck at all (doubtful).

This morning, prior to rebooting the system to get it out of this state, 
I began unmounting filesystems in case of a panic, and after unmounting 
(successfully) two of the filesystems (the ones I've never seen an issue 
on), I tried unmounting the third (/scr02), and a panic ensued.  /scr01 
is the other filesystem that is giving me issues.

Some information about the system/setup:

FreeBSD smd2.centtech.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 12 
13:24:02 CDT 2006

# df -ilk
Filesystem     1K-blocks      Used     Avail Capacity iused    ifree 
%iused  Mounted on
/dev/amrd0s1a   20308398   3098864  15584864    17%  259261  2378561 
10%   /
devfs                  1         1         0   100%       0        0 
100%   /dev
/dev/amrd0s1d   13065232   3960250   8059764    33%     870  1694872 
0%   /var
/dev/ufs/rss   213268540  93886480 102320578    48%  399297 27180093 
1%   /rss
/dev/ufs/scr02 213268540 116904962  79302096    60%  426573 27152817 
2%   /scr02
/dev/ufs/scr04 167568544  93374026  60789036    61%   13008 21654830 
0%   /scr04
/dev/ufs/scr01 232100360 161547746  51984586    76%  531834 29473412 
2%   /scr01

(rss and scr04 never give me any issues)

All four of the ufs/* partitions are on the same RAID array, and I don't 
believe there is any underlying disk issue.

Here's some kgdb output from when the system was wedged on /scr01, but 
the unmount of /scr02 caused a panic:

# kgdb -q -n 3
[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:
Mount point /scr02 had 1 dangling refs
panic: unmount: dangling vnode
cpuid = 0
KDB: enter: panic
Dumping 1023 MB (2 chunks)
   chunk 0: 1MB (159 pages) ... ok
   chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 
863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 
575 5
59 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 
271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, 
dummy3=-1064859081, dummy4=0xe8de3ab8 "ä:Þè\234l\207ÀÐ:ÞèÔ:Þè\220\a")
     at /usr/src/sys/ddb/db_command.c:492
#2  0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, 
aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68)
     at /usr/src/sys/ddb/db_command.c:350
#3  0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4  0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8de3bfc) at 
/usr/src/sys/kern/subr_kdb.c:473
#6  0xc0896338 in trap (frame=
       {tf_fs = -388104184, tf_es = -1066860504, tf_ds = -1064304600, 
tf_edi = -1064235220, tf_esi = 1, tf_ebp = -388088772, tf_isp = 
-388088792, tf
_ebx = -388088728, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, 
tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 
646, tf_
esp = -388088740, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593
#7  0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8  0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at 
cpufunc.h:60
#9  0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at 
/usr/src/sys/kern/kern_shutdown.c:549
#10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at 
/usr/src/sys/kern/vfs_mount.c:514
#11 0xc06d2d26 in dounmount (mp=0xc5964000, flags=134217728, 
td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:1162
#12 0xc06d27de in unmount (td=0xc620c600, uap=0xe8de3d04) at 
/usr/src/sys/kern/vfs_mount.c:1052
#13 0xc0896c0b in syscall (frame=
       {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 
134535289, tf_ebp = -1077942776, tf_isp = -388088476, tf_ebx = -1077942864,
  tf_edx = 26, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, 
tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, 
tf_ss = 5
9}) at /usr/src/sys/i386/i386/trap.c:981
#14 0xc0881eaf in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:200
#15 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 10
#10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at 
/usr/src/sys/kern/vfs_mount.c:514
514                     panic("unmount: dangling vnode");
(kgdb) l
509                     printf("mount point secondary write ops 
completed\n");
510             }
511             MNT_IUNLOCK(mp);
512             mp->mnt_vfc->vfc_refcount--;
513             if (!TAILQ_EMPTY(&mp->mnt_nvnodelist))
514                     panic("unmount: dangling vnode");
515             lockdestroy(&mp->mnt_lock);
516             MNT_ILOCK(mp);
517             if (mp->mnt_kern_flag & MNTK_MWAIT)
518                     wakeup(mp);

(kgdb) p *mp
$2 = {mnt_list = {tqe_next = 0xc5964400, tqe_prev = 0xc59bbc00}, mnt_op 
= 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5ae0cc0,
   mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc6d59440, tqh_last 
= 0xc6d59454}, mnt_lock = {lk_interlock = 0xc09eac84, lk_flags = 1048576,
     lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio 
= 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0,
     lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = 
{mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount 
mtx",
       lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, 
lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock 
= 4,
     mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt 
= 0xc5926a00, mnt_optnew = 0x0, mnt_kern_flag = 553648128,
   mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 
5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384,
     f_blocks = 106634270, f_bfree = 48180134, f_bavail = 39649393, 
f_files = 27579390, f_ffree = 27152822, f_syncwrites = 0, f_asyncwrites 
= 0,
     f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 
0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508928,
         -571478071}}, f_charspare = '\0' <repeats 79 times>, 
f_fstypename = "ufs", '\0' <repeats 12 times>,
     f_mntfromname = "/dev/ufs/scr02", '\0' <repeats 73 times>, 
f_mntonname = "/scr02", '\0' <repeats 81 times>}, mnt_cred = 0xc59f2080,
   mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 
0xc5d25c00, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1,
   mnt_hashseed = 3369618744, mnt_markercnt = 0, mnt_holdcnt = 0, 
mnt_holdcntwaiters = 0, mnt_secondary_writes = 0,
   mnt_secondary_accwrites = 2126786, mnt_ref = 1}
(kgdb) p mp->mnt_vfc->vfc_refcount
$3 = 4


Anything else I can provide to help find the issue?


Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?450EB4FE.2090203>