From owner-freebsd-current@FreeBSD.ORG Tue Sep 19 01:39:27 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B40D16A4F3 for ; Tue, 19 Sep 2006 01:39:27 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D43443D49 for ; Tue, 19 Sep 2006 01:39:26 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [192.168.42.24] (andersonbox4.centtech.com [192.168.42.24]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k8J1dQNJ010588 for ; Mon, 18 Sep 2006 20:39:26 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <450F4A59.6090902@centtech.com> Date: Mon, 18 Sep 2006 20:39:37 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV 0.87.1/1894/Mon Sep 18 19:12:43 2006 on mh1.centtech.com X-Virus-Status: Clean Subject: [Fwd: Re: 6-STABLE filesystem related panics/locks (kgdb output)] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Sep 2006 01:39:27 -0000 No response on hackers@, so I'm sending here too. Also, this machine just recently entered this state again. I can get into the debugger in the morning (6am central time), if anyone has any suggestions or info I should get from it. Eric On 09/18/06 10:02, Eric Anderson wrote: > Hi all, > > On one of our NFS servers, we've seen repeated filesystem issues with > two of the filesystems (it has 4 exported via NFS). It usually > manifests itself by a hung 'df -lk' (wedged in 'ufs'), and mountd > becomes wedged also, not allowing new mounts, and unable to be killed. > From an NFS client, one can continue using the filesystem just fine, > without an issue. From the server itself, you can cd to the > filesystem's root directory, but an ls will hang. Running a background > fsck on that filesystem while in this state also blocks on ufs. My nfsd > processes with also get stuck in the 'D' state (in 'ufs'), but they > still appear to be serving data. About a month ago, I brought the system > down, did a full fsck on all the filesystems, and brought it back up. > It survived for several weeks (2-3), but is now doing the same thing, so > I'm uncertain if the issue was affected by the fsck at all (doubtful). > > This morning, prior to rebooting the system to get it out of this state, > I began unmounting filesystems in case of a panic, and after unmounting > (successfully) two of the filesystems (the ones I've never seen an issue > on), I tried unmounting the third (/scr02), and a panic ensued. /scr01 > is the other filesystem that is giving me issues. > > Some information about the system/setup: > > FreeBSD smd2.centtech.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 12 > 13:24:02 CDT 2006 > > # df -ilk > Filesystem 1K-blocks Used Avail Capacity iused ifree > %iused Mounted on > /dev/amrd0s1a 20308398 3098864 15584864 17% 259261 2378561 > 10% / > devfs 1 1 0 100% 0 0 > 100% /dev > /dev/amrd0s1d 13065232 3960250 8059764 33% 870 1694872 > 0% /var > /dev/ufs/rss 213268540 93886480 102320578 48% 399297 27180093 > 1% /rss > /dev/ufs/scr02 213268540 116904962 79302096 60% 426573 27152817 > 2% /scr02 > /dev/ufs/scr04 167568544 93374026 60789036 61% 13008 21654830 > 0% /scr04 > /dev/ufs/scr01 232100360 161547746 51984586 76% 531834 29473412 > 2% /scr01 > > (rss and scr04 never give me any issues) > > All four of the ufs/* partitions are on the same RAID array, and I don't > believe there is any underlying disk issue. > > Here's some kgdb output from when the system was wedged on /scr01, but > the unmount of /scr02 caused a panic: > > # kgdb -q -n 3 > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > Unread portion of the kernel message buffer: > Mount point /scr02 had 1 dangling refs > panic: unmount: dangling vnode > cpuid = 0 > KDB: enter: panic > Dumping 1023 MB (2 chunks) > chunk 0: 1MB (159 pages) ... ok > chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 > 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 > 575 5 > 59 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 > 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 > > #0 doadump () at pcpu.h:165 > 165 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:165 > #1 0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, > dummy3=-1064859081, dummy4=0xe8de3ab8 "ä:Þè\234l\207ÀÐ:ÞèÔ:Þè\220\a") > at /usr/src/sys/ddb/db_command.c:492 > #2 0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, > aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68) > at /usr/src/sys/ddb/db_command.c:350 > #3 0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458 > #4 0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221 > #5 0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8de3bfc) at > /usr/src/sys/kern/subr_kdb.c:473 > #6 0xc0896338 in trap (frame= > {tf_fs = -388104184, tf_es = -1066860504, tf_ds = -1064304600, > tf_edi = -1064235220, tf_esi = 1, tf_ebp = -388088772, tf_isp = > -388088792, tf > _ebx = -388088728, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, > tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = > 646, tf_ > esp = -388088740, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593 > #7 0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 > #8 0xc0697973 in kdb_enter (msg=0x12
) at > cpufunc.h:60 > #9 0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at > /usr/src/sys/kern/kern_shutdown.c:549 > #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at > /usr/src/sys/kern/vfs_mount.c:514 > #11 0xc06d2d26 in dounmount (mp=0xc5964000, flags=134217728, > td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:1162 > #12 0xc06d27de in unmount (td=0xc620c600, uap=0xe8de3d04) at > /usr/src/sys/kern/vfs_mount.c:1052 > #13 0xc0896c0b in syscall (frame= > {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = > 134535289, tf_ebp = -1077942776, tf_isp = -388088476, tf_ebx = -1077942864, > tf_edx = 26, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, > tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, > tf_ss = 5 > 9}) at /usr/src/sys/i386/i386/trap.c:981 > #14 0xc0881eaf in Xint0x80_syscall () at > /usr/src/sys/i386/i386/exception.s:200 > #15 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) frame 10 > #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at > /usr/src/sys/kern/vfs_mount.c:514 > 514 panic("unmount: dangling vnode"); > (kgdb) l > 509 printf("mount point secondary write ops > completed\n"); > 510 } > 511 MNT_IUNLOCK(mp); > 512 mp->mnt_vfc->vfc_refcount--; > 513 if (!TAILQ_EMPTY(&mp->mnt_nvnodelist)) > 514 panic("unmount: dangling vnode"); > 515 lockdestroy(&mp->mnt_lock); > 516 MNT_ILOCK(mp); > 517 if (mp->mnt_kern_flag & MNTK_MWAIT) > 518 wakeup(mp); > > (kgdb) p *mp > $2 = {mnt_list = {tqe_next = 0xc5964400, tqe_prev = 0xc59bbc00}, mnt_op > = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5ae0cc0, > mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc6d59440, tqh_last > = 0xc6d59454}, mnt_lock = {lk_interlock = 0xc09eac84, lk_flags = 1048576, > lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio > = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0, > lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = > {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount > mtx", > lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, > lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock > = 4, > mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt > = 0xc5926a00, mnt_optnew = 0x0, mnt_kern_flag = 553648128, > mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = > 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384, > f_blocks = 106634270, f_bfree = 48180134, f_bavail = 39649393, > f_files = 27579390, f_ffree = 27152822, f_syncwrites = 0, f_asyncwrites > = 0, > f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, > 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508928, > -571478071}}, f_charspare = '\0' , > f_fstypename = "ufs", '\0' , > f_mntfromname = "/dev/ufs/scr02", '\0' , > f_mntonname = "/scr02", '\0' }, mnt_cred = 0xc59f2080, > mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = > 0xc5d25c00, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1, > mnt_hashseed = 3369618744, mnt_markercnt = 0, mnt_holdcnt = 0, > mnt_holdcntwaiters = 0, mnt_secondary_writes = 0, > mnt_secondary_accwrites = 2126786, mnt_ref = 1} > (kgdb) p mp->mnt_vfc->vfc_refcount > $3 = 4 > > > Anything else I can provide to help find the issue? > > > Eric > > > Another batch of kgdb output from this same system, with the same issue: # kgdb -q -n 1 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] Unread portion of the kernel message buffer: Mount point /rss had 1 dangling refs panic: unmount: dangling vnode cpuid = 0 KDB: enter: panic Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:165 #1 0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, dummy3=-1064859081, dummy4=0xe8e65ab8 "äZæè\234l\207ÀÐZæèÔZæè\220\a") at /usr/src/sys/ddb/db_command.c:492 #2 0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68) at /usr/src/sys/ddb/db_command.c:350 #3 0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458 #4 0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221 #5 0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8e65bfc) at /usr/src/sys/kern/subr_kdb.c:473 #6 0xc0896338 in trap (frame= {tf_fs = -387579896, tf_es = -1066860504, tf_ds = -1064304600, tf_edi = -1064235220, tf_esi = 1, tf_ebp = -387556292, tf_isp = -387556312, tf_ebx = -387556248, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 646, tf_esp = -387556260, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593 #7 0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #8 0xc0697973 in kdb_enter (msg=0x12
) at cpufunc.h:60 #9 0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at /usr/src/sys/kern/kern_shutdown.c:549 #10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:514 #11 0xc06d2d26 in dounmount (mp=0xc59bbc00, flags=134217728, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:1162 #12 0xc06d27de in unmount (td=0xc5c16000, uap=0xe8e65d04) at /usr/src/sys/kern/vfs_mount.c:1052 #13 0xc0896c0b in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 134534817, tf_ebp = -1077942776, tf_isp = -387555996, tf_ebx = -1077942864, tf_edx = 25, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981 #14 0xc0881eaf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #15 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 10 #10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:514 514 panic("unmount: dangling vnode"); (kgdb) l 509 printf("mount point secondary write ops completed\n"); 510 } 511 MNT_IUNLOCK(mp); 512 mp->mnt_vfc->vfc_refcount--; 513 if (!TAILQ_EMPTY(&mp->mnt_nvnodelist)) 514 panic("unmount: dangling vnode"); 515 lockdestroy(&mp->mnt_lock); 516 MNT_ILOCK(mp); 517 if (mp->mnt_kern_flag & MNTK_MWAIT) 518 wakeup(mp); (kgdb) p mp->mnt_vfc->vfc_refcount $1 = 4 (kgdb) p *mp $2 = {mnt_list = {tqe_next = 0xc595d000, tqe_prev = 0xc59bc000}, mnt_op = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5a81cc0, mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc8af4000, tqh_last = 0xc8af4014}, mnt_lock = {lk_interlock = 0xc09eac18, lk_flags = 1048576, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0, lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount mtx", lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt = 0xc5728a40, mnt_optnew = 0x0, mnt_kern_flag = 553648128, mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384, f_blocks = 106634270, f_bfree = 65410962, f_bavail = 56880221, f_files = 27579390, f_ffree = 27203064, f_syncwrites = 0, f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508926, 499625180}}, f_charspare = '\0' , f_fstypename = "ufs", '\0' , f_mntfromname = "/dev/ufs/rss", '\0' , f_mntonname = "/rss", '\0' }, mnt_cred = 0xc5a56c80, mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 0xc59e8000, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1, mnt_hashseed = 2115021039, mnt_markercnt = 0, mnt_holdcnt = 0, mnt_holdcntwaiters = 0, mnt_secondary_writes = 0, mnt_secondary_accwrites = 12553194, mnt_ref = 1} (kgdb) p &mp->mnt_nvnodelist $3 = (struct vnodelst *) 0xc59bbc18 (kgdb) p mp->mnt_nvnodelist $4 = {tqh_first = 0xc8af4000, tqh_last = 0xc8af4014} Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------