Date: Mon, 18 Sep 2006 10:02:22 -0500 From: Eric Anderson <anderson@centtech.com> To: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: 6-STABLE filesystem related panics/locks (kgdb output) Message-ID: <450EB4FE.2090203@centtech.com>
next in thread | raw e-mail | index | archive | help
Hi all, On one of our NFS servers, we've seen repeated filesystem issues with two of the filesystems (it has 4 exported via NFS). It usually manifests itself by a hung 'df -lk' (wedged in 'ufs'), and mountd becomes wedged also, not allowing new mounts, and unable to be killed. From an NFS client, one can continue using the filesystem just fine, without an issue. From the server itself, you can cd to the filesystem's root directory, but an ls will hang. Running a background fsck on that filesystem while in this state also blocks on ufs. My nfsd processes with also get stuck in the 'D' state (in 'ufs'), but they still appear to be serving data. About a month ago, I brought the system down, did a full fsck on all the filesystems, and brought it back up. It survived for several weeks (2-3), but is now doing the same thing, so I'm uncertain if the issue was affected by the fsck at all (doubtful). This morning, prior to rebooting the system to get it out of this state, I began unmounting filesystems in case of a panic, and after unmounting (successfully) two of the filesystems (the ones I've never seen an issue on), I tried unmounting the third (/scr02), and a panic ensued. /scr01 is the other filesystem that is giving me issues. Some information about the system/setup: FreeBSD smd2.centtech.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 12 13:24:02 CDT 2006 # df -ilk Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/amrd0s1a 20308398 3098864 15584864 17% 259261 2378561 10% / devfs 1 1 0 100% 0 0 100% /dev /dev/amrd0s1d 13065232 3960250 8059764 33% 870 1694872 0% /var /dev/ufs/rss 213268540 93886480 102320578 48% 399297 27180093 1% /rss /dev/ufs/scr02 213268540 116904962 79302096 60% 426573 27152817 2% /scr02 /dev/ufs/scr04 167568544 93374026 60789036 61% 13008 21654830 0% /scr04 /dev/ufs/scr01 232100360 161547746 51984586 76% 531834 29473412 2% /scr01 (rss and scr04 never give me any issues) All four of the ufs/* partitions are on the same RAID array, and I don't believe there is any underlying disk issue. Here's some kgdb output from when the system was wedged on /scr01, but the unmount of /scr02 caused a panic: # kgdb -q -n 3 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] Unread portion of the kernel message buffer: Mount point /scr02 had 1 dangling refs panic: unmount: dangling vnode cpuid = 0 KDB: enter: panic Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 5 59 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, dummy3=-1064859081, dummy4=0xe8de3ab8 "ä:Þè\234l\207ÀÐ:ÞèÔ:Þè\220\a") at /usr/src/sys/ddb/db_command.c:492 #2 0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68) at /usr/src/sys/ddb/db_command.c:350 #3 0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458 #4 0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221 #5 0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8de3bfc) at /usr/src/sys/kern/subr_kdb.c:473 #6 0xc0896338 in trap (frame= {tf_fs = -388104184, tf_es = -1066860504, tf_ds = -1064304600, tf_edi = -1064235220, tf_esi = 1, tf_ebp = -388088772, tf_isp = -388088792, tf _ebx = -388088728, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 646, tf_ esp = -388088740, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593 #7 0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #8 0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:60 #9 0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at /usr/src/sys/kern/kern_shutdown.c:549 #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:514 #11 0xc06d2d26 in dounmount (mp=0xc5964000, flags=134217728, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:1162 #12 0xc06d27de in unmount (td=0xc620c600, uap=0xe8de3d04) at /usr/src/sys/kern/vfs_mount.c:1052 #13 0xc0896c0b in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 134535289, tf_ebp = -1077942776, tf_isp = -388088476, tf_ebx = -1077942864, tf_edx = 26, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, tf_ss = 5 9}) at /usr/src/sys/i386/i386/trap.c:981 #14 0xc0881eaf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #15 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 10 #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:514 514 panic("unmount: dangling vnode"); (kgdb) l 509 printf("mount point secondary write ops completed\n"); 510 } 511 MNT_IUNLOCK(mp); 512 mp->mnt_vfc->vfc_refcount--; 513 if (!TAILQ_EMPTY(&mp->mnt_nvnodelist)) 514 panic("unmount: dangling vnode"); 515 lockdestroy(&mp->mnt_lock); 516 MNT_ILOCK(mp); 517 if (mp->mnt_kern_flag & MNTK_MWAIT) 518 wakeup(mp); (kgdb) p *mp $2 = {mnt_list = {tqe_next = 0xc5964400, tqe_prev = 0xc59bbc00}, mnt_op = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5ae0cc0, mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc6d59440, tqh_last = 0xc6d59454}, mnt_lock = {lk_interlock = 0xc09eac84, lk_flags = 1048576, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0, lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount mtx", lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt = 0xc5926a00, mnt_optnew = 0x0, mnt_kern_flag = 553648128, mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384, f_blocks = 106634270, f_bfree = 48180134, f_bavail = 39649393, f_files = 27579390, f_ffree = 27152822, f_syncwrites = 0, f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508928, -571478071}}, f_charspare = '\0' <repeats 79 times>, f_fstypename = "ufs", '\0' <repeats 12 times>, f_mntfromname = "/dev/ufs/scr02", '\0' <repeats 73 times>, f_mntonname = "/scr02", '\0' <repeats 81 times>}, mnt_cred = 0xc59f2080, mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 0xc5d25c00, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1, mnt_hashseed = 3369618744, mnt_markercnt = 0, mnt_holdcnt = 0, mnt_holdcntwaiters = 0, mnt_secondary_writes = 0, mnt_secondary_accwrites = 2126786, mnt_ref = 1} (kgdb) p mp->mnt_vfc->vfc_refcount $3 = 4 Anything else I can provide to help find the issue? Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?450EB4FE.2090203>