Date: Fri, 3 Feb 2006 23:19:26 -0800 (PST) From: Steve Watt <steve@Watt.COM> To: FreeBSD-gnats-submit@FreeBSD.org Subject: i386/92795: lockmgr panic during sys_exit file cleanup Message-ID: <200602040719.k147JQqC022894@wattres.watt.com> Resent-Message-ID: <200602040720.k147K6F6001326@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 92795 >Category: i386 >Synopsis: lockmgr panic during sys_exit file cleanup >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Feb 04 07:20:06 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Steve Watt >Release: FreeBSD 5.4-STABLE i386 >Organization: Watt Consultants >Environment: System: FreeBSD wattres.Watt.COM 5.4-STABLE FreeBSD 5.4-STABLE #9: Wed Jan 11 23:14:23 PST 2006 root@wattres.Watt.COM:/usr/obj/usr/src/sys/WATTRES i386 Last updated to 5-STABLE around 1717Z 11 Jan 06 Machine is running inn and dump -L when the panic occurs. >Description: Something ugly is going on again/still between dump and (possibly) inn; there is/was a deadlock between ffs_mksnap and msync. See my mail to -hackers around 22 May 05. I suspect this is related somehow because it's my daily dump that's causing the problem. (Two days in a row, this time, just to be annoying.) Here's the kgdb session from this morning's failure: (root@wattres) 12# kgdb kernel.debug /var/crash/vmcore.21 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: panic: lockmgr: thread 0xc1a14780, not exclusive lock holder 0xc1561600 unlocking KDB: enter: panic Uptime: 21h40m31s Dumping 223 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 #0 doadump () at pcpu.h:160 160 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) info stack #0 doadump () at pcpu.h:160 #1 0xc06627af in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:412 #2 0xc0662a71 in panic (fmt=0xc08a5cd6 "lockmgr: thread %p, not %s %p unlocking") at /usr/src/sys/kern/kern_shutdown.c:568 #3 0xc06570af in lockmgr (lkp=0xc205d0ac, flags=6, interlkp=0x0, td=0xc1a14780) at /usr/src/sys/kern/kern_lock.c:420 #4 0xc06b2cee in vop_stdunlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:295 #5 0xc06b2b6e in vop_defaultop (ap=0x0) at /usr/src/sys/kern/vfs_default.c:157 #6 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 #7 0xc07be44b in ufs_inactive (ap=0x0) at vnode_if.h:1044 #8 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 #9 0xc06bbaa4 in vrele (vp=0xc205d000) at vnode_if.h:953 #10 0xc07c2407 in ufs_close (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:317 #11 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 #12 0xc06c6ca5 in vn_close (vp=0xc205d000, flags=0, file_cred=0x0, td=0x0) at vnode_if.h:262 #13 0xc06c7c76 in vn_closefile (fp=0xc2111154, td=0x0) at /usr/src/sys/kern/vfs_vnops.c:953 #14 0xc06440b8 in fdrop_locked (fp=0xc2111154, td=0x0) at file.h:288 #15 0xc064401c in fdrop (fp=0xc2111154, td=0x0) at /usr/src/sys/kern/kern_descrip.c:2105 #16 0xc0642a0b in closef (fp=0xc2111154, td=0xc1a14780) at /usr/src/sys/kern/kern_descrip.c:1922 #17 0xc0641aea in fdfree (td=0xc1a14780) at /usr/src/sys/kern/kern_descrip.c:1622 #18 0xc064a87e in exit1 (td=0xc1a14780, rv=0) at /usr/src/sys/kern/kern_exit.c:237 #19 0xc064a401 in sys_exit (td=0x0, uap=0x0) at /usr/src/sys/kern/kern_exit.c:94 #20 0xc08306c4 in syscall (frame= {tf_fs = 47, tf_es = 135921711, tf_ds = -1078001617, tf_edi = 33134, tf_esi = 134528110, tf_ebp = -1077943064, tf_isp = -806167196, tf_ebx = 672471660, tf_edx = 672471344, tf_ecx = 672471344, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 671936515, tf_cs = 31, tf_eflags = 646, tf_esp = -1077943076, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1014 #21 0xc081ed4f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201 ---Type <return> to continue, or q <return> to quit---q Quit (kgdb) up #1 0xc06627af in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:412 412 doadump(); (kgdb) up #2 0xc0662a71 in panic (fmt=0xc08a5cd6 "lockmgr: thread %p, not %s %p unlocking") at /usr/src/sys/kern/kern_shutdown.c:568 568 boot(bootopt); (kgdb) up #3 0xc06570af in lockmgr (lkp=0xc205d0ac, flags=6, interlkp=0x0, td=0xc1a14780) at /usr/src/sys/kern/kern_lock.c:420 420 panic("lockmgr: thread %p, not %s %p unlocking", (kgdb) list 415 416 case LK_RELEASE: 417 if (lkp->lk_exclusivecount != 0) { 418 if (lkp->lk_lockholder != thr && 419 lkp->lk_lockholder != LK_KERNPROC) { 420 panic("lockmgr: thread %p, not %s %p unlocking", 421 thr, "exclusive lock holder", 422 lkp->lk_lockholder); 423 } 424 if (lkp->lk_exclusivecount == 1) { (kgdb) print *thr $1 = {td_proc = 0xc168f54c, td_ksegrp = 0xc1b21a10, td_plist = {tqe_next = 0x0, tqe_prev = 0xc168f55c}, td_kglist = {tqe_next = 0x0, tqe_prev = 0xc1b21a1c}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xc176c780}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xc090342c}, td_selq = { tqh_first = 0x0, tqh_last = 0xc1a147b0}, td_sleepqueue = 0xc176c780, td_turnstile = 0xc18f1b40, td_tid = 100135, td_flags = 16842754, td_inhibitors = 0, td_pflags = 0, td_dupfd = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 0 '\0', td_oncpu = 0 '\0', td_locks = -10048, td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred = 0xc1ed2100, td_standin = 0x0, td_prticks = 0, td_upcall = 0x0, td_sticks = 1692, td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = { __bits = {0, 0, 0, 0}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_umtx = { tqe_next = 0x0, tqe_prev = 0x0}, td_generation = 24058, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_kflags = 0, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_base_pri = 77 'M', td_priority = 77 'M', td_pcb = 0xcff2dd90, td_state = TDS_RUNNING, td_retval = {0, 672471344}, td_slpcallout = {c_links = {sle = { sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xc5acfb18}}, c_time = 7706191, c_arg = 0xc1a14780, c_func = 0xc068169a <sleepq_timeout>, c_flags = 8}, td_frame = 0xcff2dd38, td_kstack_obj = 0xc1a0bc60, td_kstack = 3488792576, td_kstack_pages = 2, td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, td_critnest = 0, td_md = {md_savecrit = 582}, td_sched = 0xc1a148d0} (kgdb) print *thr->td_proc $2 = {p_list = {le_next = 0xc1e69000, le_prev = 0xc1e6ac5c}, p_ksegrps = { tqh_first = 0xc1b21a10, tqh_last = 0xc1b21a14}, p_threads = {tqh_first = 0xc1a14780, tqh_last = 0xc1a14788}, p_suspended = {tqh_first = 0x0, tqh_last = 0xc168f564}, p_ucred = 0xc1ed2100, p_fd = 0xc23e2500, p_fdtol = 0x0, p_stats = 0xc1f40200, p_limit = 0xc1c81300, p_unused1 = 0x0, p_sigacts = 0xc23d0000, p_flag = 24576, p_sflag = 1, p_state = PRS_NORMAL, p_pid = 33134, p_hash = {le_next = 0x0, le_prev = 0xc14691b8}, p_pglist = {le_next = 0x0, le_prev = 0xc25513dc}, p_pptr = 0xc2551388, p_sibling = {le_next = 0x0, le_prev = 0xc25513f0}, p_children = { lh_first = 0x0}, p_mtx = {mtx_object = {lo_class = 0xc09012dc, lo_name = 0xc08a6b18 "process lock", lo_type = 0xc08a6b18 "process lock", lo_flags = 4390912, lo_list = {tqe_next = 0xc168f77c, tqe_prev = 0xc168f404}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, p_oppid = 0, p_vmspace = 0xc254ea8c, p_swtime = 704, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_runtime = {sec = 2, frac = 1951776024854486118}, p_uu = 0, p_su = 0, p_iu = 0, p_uticks = 67, p_sticks = 1742, p_iticks = 0, p_profthreads = 0, p_maxthrwaits = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xc1aa5630, p_siglist = {__bits = {0, 0, 0, 0}}, p_lock = 0 '\0', p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, p_magic = 3203398350, p_comm = "dump\000er", '\0' <repeats 12 times>, p_pgrp = 0xc1834d00, p_sysent = 0xc09422e0, p_args = 0xc1ad7c00, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_xstat = 0, p_klist = {kl_lock = 0xc168f5b8, kl_list = { slh_first = 0x0}}, p_numthreads = 1, p_numksegrps = 1, p_md = { md_ldt = 0xc246ba20}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = { tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_flags = 8}, p_unused2 = 0x0, p_acflag = 0, p_ru = 0xc1d47980, p_peers = 0x0, p_leader = 0xc168f54c, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xc168f710} ... so it seems to be dump that died on exit. (kgdb) up #4 0xc06b2cee in vop_stdunlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:295 295 return (lockmgr(vp->v_vnlock, ap->a_flags | LK_RELEASE, VI_MTX(vp), (kgdb) list 290 struct thread *a_td; 291 } */ *ap; 292 { 293 struct vnode *vp = ap->a_vp; 294 295 return (lockmgr(vp->v_vnlock, ap->a_flags | LK_RELEASE, VI_MTX(vp), 296 ap->a_td)); 297 } 298 299 /* See above. */ (kgdb) list - 280 ap->a_td, "vop_stdlock", vp->filename, vp->line)); 281 #endif 282 } 283 284 /* See above. */ 285 int 286 vop_stdunlock(ap) 287 struct vop_unlock_args /* { 288 struct vnode *a_vp; 289 int a_flags; (kgdb) up #5 0xc06b2b6e in vop_defaultop (ap=0x0) at /usr/src/sys/kern/vfs_default.c:157 157 return (VOCALL(default_vnodeop_p, ap->a_desc->vdesc_offset, ap)); (kgdb) list 152 */ 153 int 154 vop_defaultop(struct vop_generic_args *ap) 155 { 156 157 return (VOCALL(default_vnodeop_p, ap->a_desc->vdesc_offset, ap)); 158 } 159 160 /* 161 * Helper function to panic on some bad VOPs in some filesystems. (kgdb) up #6 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 2827 return (VOCALL(ufs_vnodeop_p, ap->a_desc->vdesc_offset, ap)); (kgdb) list 2822 ufs_vnoperate(ap) 2823 struct vop_generic_args /* { 2824 struct vnodeop_desc *a_desc; 2825 } */ *ap; 2826 { 2827 return (VOCALL(ufs_vnodeop_p, ap->a_desc->vdesc_offset, ap)); 2828 } 2829 2830 int 2831 ufs_vnoperatefifo(ap) (kgdb) up #7 0xc07be44b in ufs_inactive (ap=0x0) at vnode_if.h:1044 1044 rc = VCALL(vp, VOFFSET(vop_unlock), &a); (kgdb) list 1039 a.a_flags = flags; 1040 a.a_td = td; 1041 #ifdef DEBUG_VFS_LOCKS 1042 vop_unlock_pre(&a); 1043 #endif 1044 rc = VCALL(vp, VOFFSET(vop_unlock), &a); 1045 CTR3(KTR_VOP, "VOP_UNLOCK(vp 0x%lX, flags %ld, td 0x%lX)", vp, flags, td); 1046 if (rc == 0) { 1047 } else { 1048 } (kgdb) list - 1029 extern struct vnodeop_desc vop_unlock_desc; 1030 static __inline int VOP_UNLOCK( 1031 struct vnode *vp, 1032 int flags, 1033 struct thread *td) 1034 { 1035 struct vop_unlock_args a; 1036 int rc; 1037 a.a_desc = VDESC(vop_unlock); 1038 a.a_vp = vp; (kgdb) up #8 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 2827 return (VOCALL(ufs_vnodeop_p, ap->a_desc->vdesc_offset, ap)); (kgdb) list 2822 ufs_vnoperate(ap) 2823 struct vop_generic_args /* { 2824 struct vnodeop_desc *a_desc; 2825 } */ *ap; 2826 { 2827 return (VOCALL(ufs_vnodeop_p, ap->a_desc->vdesc_offset, ap)); 2828 } 2829 2830 int 2831 ufs_vnoperatefifo(ap) (kgdb) up #9 0xc06bbaa4 in vrele (vp=0xc205d000) at vnode_if.h:953 953 rc = VCALL(vp, VOFFSET(vop_inactive), &a); (kgdb) list 948 a.a_desc = VDESC(vop_inactive); 949 a.a_vp = vp; 950 a.a_td = td; 951 ASSERT_VI_UNLOCKED(vp, "VOP_INACTIVE"); 952 ASSERT_VOP_LOCKED(vp, "VOP_INACTIVE"); 953 rc = VCALL(vp, VOFFSET(vop_inactive), &a); 954 CTR2(KTR_VOP, "VOP_INACTIVE(vp 0x%lX, td 0x%lX)", vp, td); 955 if (rc == 0) { 956 ASSERT_VI_UNLOCKED(vp, "VOP_INACTIVE"); 957 ASSERT_VOP_UNLOCKED(vp, "VOP_INACTIVE"); ... OK, that's weird, there's an ASSERT_VOP_LOCKED() here. Does that mean that the assert_VOP_LOCKED doesn't check for exclusive? (kgdb) up #10 0xc07c2407 in ufs_close (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:317 317 vrele(vp); (kgdb) list 312 * XXX - EAGAIN is returned to prevent vn_close from 313 * repeating the vrele operation. 314 */ 315 if (vp->v_type == VREG && VTOI(vp)->i_effnlink == 0) { 316 (void) vn_start_write(vp, &mp, V_WAIT); 317 vrele(vp); 318 vn_finished_write(mp); 319 return (EAGAIN); 320 } 321 } (kgdb) up #11 0xc07c6234 in ufs_vnoperate (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2827 2827 return (VOCALL(ufs_vnodeop_p, ap->a_desc->vdesc_offset, ap)); (kgdb) up #12 0xc06c6ca5 in vn_close (vp=0xc205d000, flags=0, file_cred=0x0, td=0x0) at vnode_if.h:262 262 rc = VCALL(vp, VOFFSET(vop_close), &a); (kgdb) up #13 0xc06c7c76 in vn_closefile (fp=0xc2111154, td=0x0) at /usr/src/sys/kern/vfs_vnops.c:953 953 error = vn_close(vp, fp->f_flag, fp->f_cred, td); (kgdb) up #14 0xc06440b8 in fdrop_locked (fp=0xc2111154, td=0x0) at file.h:288 288 return ((*fp->f_ops->fo_close)(fp, td)); (kgdb) list 283 fo_close(fp, td) 284 struct file *fp; 285 struct thread *td; 286 { 287 288 return ((*fp->f_ops->fo_close)(fp, td)); 289 } 290 291 static __inline int 292 fo_kqfilter(fp, kn) (kgdb) up #15 0xc064401c in fdrop (fp=0xc2111154, td=0x0) at /usr/src/sys/kern/kern_descrip.c:2105 2105 return (fdrop_locked(fp, td)); (kgdb) up #16 0xc0642a0b in closef (fp=0xc2111154, td=0xc1a14780) at /usr/src/sys/kern/kern_descrip.c:1922 1922 return (fdrop(fp, td)); (kgdb) up #17 0xc0641aea in fdfree (td=0xc1a14780) at /usr/src/sys/kern/kern_descrip.c:1622 1622 (void) closef(*fpp, td); (kgdb) up #18 0xc064a87e in exit1 (td=0xc1a14780, rv=0) at /usr/src/sys/kern/kern_exit.c:237 237 fdfree(td); (kgdb) up #19 0xc064a401 in sys_exit (td=0x0, uap=0x0) at /usr/src/sys/kern/kern_exit.c:94 94 exit1(td, W_EXITCODE(uap->rval, 0)); >How-To-Repeat: There doesn't appear to be an iron-clad method; I believe it's a race condition of some sort. On my system, my daily dumps cause some kind of panic about once every 2-3 weeks. >Fix: Maybe I should pull up to 6-STABLE, but that's kinda painful as it's a semi-production box. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602040719.k147JQqC022894>