Date: Tue, 24 Oct 2000 13:00:47 +0100 From: David Malone <dwmalone@maths.tcd.ie> To: Alfred Perlstein <bright@wintelcom.net> Cc: David Malone <dwmalone@FreeBSD.org>, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/nfs nfs.h nfs_subs.c nfsm_subs.h Message-ID: <20001024130047.A37034@salmon.maths.tcd.ie> In-Reply-To: <20001024045227.B28123@fw.wintelcom.net>; from bright@wintelcom.net on Tue, Oct 24, 2000 at 04:52:28AM -0700 References: <200010241013.DAA74467@freefall.freebsd.org> <20001024045227.B28123@fw.wintelcom.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 24, 2000 at 04:52:28AM -0700, Alfred Perlstein wrote: > > There remain other was processes can get stuck in vmopar. > > huh? Do you mean even with this patch processes can still wedge > in 'vmopar'? Sorry - that should have been "remain other ways", so yes it is possible for processes to get stuck, though it can't deadlock with itself, it has to deadlock with another process. I've included the message Ian sent to -hackers, which I refered Jeroen to. David. In message <20001020145043.B73760@lflat.vas.mobilix.dk>, Vadim Belman writes: > wmesg=0xc0233171 "vmopar", timo=0) at ../../kern/kern_synch.c:467 ... >#8 0xc01dd606 in vm_fault (map=0xdc3e7e80, vaddr=712876032, > fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130 If anyone is interested, here are a few further details from my mailbox. The patch David included appears to have solved this particular problem for us, but there is another similar problem lurking within the NFS/VM system. Ian -------------------------------------------- The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode. The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal. The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp. ----------------------------------------------- Hmm. We used this patch for a while - it stopped those particular vmopar hangs, but another kind of deadlock has emerged (which happens with or without the patch). It seems that vinvalbuf() locks the vnode's v_interlock before calling vm_object_page_remove(). vm_object_page_remove will then lock a page i.e. vinvalbuf() [Lock v_interlock] -> vm_object_page_remove() [Lock page] If another process concurrently vm_fault's on the same vnode then it locks the page, and finishes with a vput(vp). vput() locks the interlock, so it results in: vm_fault() [Lock page] -> vput() [Lock v_interlock] This is a simple lock-ordering deadlock. Since vm_fault can keep the page locked for a considerable amount of time with NFS, this deadlock can happen quite easily. I'm not sure what to suggest as a solution, but keeping the v_interlock locked across a tsleep seems wrong... Any ideas? Traces below. #12 0xc02140f0 in atkbd_isa_intr (unit=0) at ../../i386/isa/atkbd_isa.c:84 #13 0xc020eceb in wait () #14 0xc01e22d3 in _unlock_things (fs=0xca6f0ef0, dealloc=0) at ../../vm/vm_fault.c:148 #15 0xc01e2b73 in vm_fault (map=0xca6d2ac0, vaddr=134766592, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:745 #16 0xc0210252 in trap_pfault (frame=0xca6f0fbc, usermode=1, eva=134769544) at ../../i386/i386/trap.c:816 #17 0xc020fda2 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = -1077946880, tf_esi = 1, tf_ebp = -1077947052, tf_isp = -898691100, tf_ebx = -1077946872, tf_edx = 4, tf_ecx = -1077947772, tf_eax = 2, tf_trapno = 12, tf_err = 4, tf_eip = 134769544, tf_cs = 31, tf_eflags = 66050, tf_esp = -1077947172, tf_ss = 39}) at ../../i386/i386/trap.c:358 #18 0x8086b88 in ?? () (kgdb) proc 1042 (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:825 #1 0xc0150b4d in tsleep (ident=0xc0598534, priority=4, wmesg=0xc024d22a "vmopar", timo=0) at ../../kern/kern_synch.c:443 #2 0xc01eaec6 in vm_page_sleep (m=0xc0598534, msg=0xc024d22a "vmopar", busy=0xc0598563 "") at ../../vm/vm_page.c:1052 #3 0xc01e9aff in vm_object_page_remove (object=0xca6bac1c, start=0, end=0, clean_only=1) at ../../vm/vm_object.c:1335 #4 0xc0172a6a in vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80, p=0xca6e5a40, slpflag=256, slptimeo=0) at ../../kern/vfs_subr.c:671 #5 0xc019541c in nfs_vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80, p=0xca6e5a40, intrflg=1) at ../../nfs/nfs_bio.c:978 #6 0xc01b6859 in nfs_open (ap=0xca6f3e2c) at ../../nfs/nfs_vnops.c:490 #7 0xc01796ae in vn_open (ndp=0xca6f3f00, fmode=1, cmode=1512) at vnode_if.h:163 #8 0xc01760d9 in open (p=0xca6e5a40, uap=0xca6f3f94) at ../../kern/vfs_syscalls.c:935 #9 0xc02108bf in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134725618, tf_esi = -1077946896, tf_ebp = -1077946944, tf_isp = -898678812, tf_ebx = -1077946956, tf_edx = -1077946588, tf_ecx = 134893176, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672042756, tf_cs = 31, tf_eflags = 514, tf_esp = -1077949296, tf_ss = 39}) at ../../i386/i386/trap.c:1100 #10 0xc01ff11c in Xint0x80_syscall () #11 0x8049d39 in ?? () ------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001024130047.A37034>