Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Oct 2000 13:00:47 +0100
From:      David Malone <dwmalone@maths.tcd.ie>
To:        Alfred Perlstein <bright@wintelcom.net>
Cc:        David Malone <dwmalone@FreeBSD.org>, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/nfs nfs.h nfs_subs.c nfsm_subs.h
Message-ID:  <20001024130047.A37034@salmon.maths.tcd.ie>
In-Reply-To: <20001024045227.B28123@fw.wintelcom.net>; from bright@wintelcom.net on Tue, Oct 24, 2000 at 04:52:28AM -0700
References:  <200010241013.DAA74467@freefall.freebsd.org> <20001024045227.B28123@fw.wintelcom.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 24, 2000 at 04:52:28AM -0700, Alfred Perlstein wrote:

> >   There remain other was processes can get stuck in vmopar.
> 
>   huh?  Do you mean even with this patch processes can still wedge
>   in 'vmopar'?

Sorry - that should have been "remain other ways", so yes it is
possible for processes to get stuck, though it can't deadlock with
itself, it has to deadlock with another process. I've included the
message Ian sent to -hackers, which I refered Jeroen to.

	David.


In message <20001020145043.B73760@lflat.vas.mobilix.dk>, Vadim Belman writes:

>    wmesg=0xc0233171 "vmopar", timo=0) at ../../kern/kern_synch.c:467
...
>#8  0xc01dd606 in vm_fault (map=0xdc3e7e80, vaddr=712876032, 
>    fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130


If anyone is interested, here are a few further details from my
mailbox. The patch David included appears to have solved this
particular problem for us, but there is another similar problem
lurking within the NFS/VM system.

Ian

--------------------------------------------
The problem seems to originate with NFS's postop_attr information
that is returned with a read or write RPC. Within a vm_fault context,
the code cannot deal with vnode_pager_setsize() shrinking a vnode.

The workaround in the patch below stops the nfsm_postop_attr() macro
from ever shrinking a vnode. If the new size in the postop_attr
information is smaller, then it just sets the nfsnode n_attrstamp to 0
to stop the wrong size getting used in the future. This change only
affects postop_attr attributes; the nfsm_loadattr() macro works as
normal.

The change is implemented by adding a new argument to nfs_loadattrcache()
called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never
reduce the vnode/nfsnode size; instead it zeros n_attrstamp.

-----------------------------------------------

Hmm. We used this patch for a while - it stopped those particular vmopar
hangs, but another kind of deadlock has emerged (which happens with or
without the patch).

It seems that vinvalbuf() locks the vnode's v_interlock before calling
vm_object_page_remove(). vm_object_page_remove will then lock a page i.e.

 vinvalbuf() [Lock v_interlock] ->
     vm_object_page_remove() [Lock page]

If another process concurrently vm_fault's on the same vnode then it
locks the page, and finishes with a vput(vp). vput() locks the
interlock, so it results in:
 
 vm_fault() [Lock page] ->
     vput() [Lock v_interlock]

This is a simple lock-ordering deadlock. Since vm_fault can keep the
page locked for a considerable amount of time with NFS, this deadlock
can happen quite easily. I'm not sure what to suggest as a solution,
but keeping the v_interlock locked across a tsleep seems wrong... Any
ideas? Traces below.


#12 0xc02140f0 in atkbd_isa_intr (unit=0) at ../../i386/isa/atkbd_isa.c:84
#13 0xc020eceb in wait ()
#14 0xc01e22d3 in _unlock_things (fs=0xca6f0ef0, dealloc=0)
    at ../../vm/vm_fault.c:148
#15 0xc01e2b73 in vm_fault (map=0xca6d2ac0, vaddr=134766592,
    fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:745
#16 0xc0210252 in trap_pfault (frame=0xca6f0fbc, usermode=1, eva=134769544)
    at ../../i386/i386/trap.c:816
#17 0xc020fda2 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = -1077946880,
      tf_esi = 1, tf_ebp = -1077947052, tf_isp = -898691100,
      tf_ebx = -1077946872, tf_edx = 4, tf_ecx = -1077947772, tf_eax = 2,
      tf_trapno = 12, tf_err = 4, tf_eip = 134769544, tf_cs = 31,
      tf_eflags = 66050, tf_esp = -1077947172, tf_ss = 39})
    at ../../i386/i386/trap.c:358
#18 0x8086b88 in ?? ()

(kgdb) proc 1042
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:825
#1  0xc0150b4d in tsleep (ident=0xc0598534, priority=4,
    wmesg=0xc024d22a "vmopar", timo=0) at ../../kern/kern_synch.c:443
#2  0xc01eaec6 in vm_page_sleep (m=0xc0598534, msg=0xc024d22a "vmopar",
    busy=0xc0598563 "") at ../../vm/vm_page.c:1052
#3  0xc01e9aff in vm_object_page_remove (object=0xca6bac1c, start=0, end=0,
    clean_only=1) at ../../vm/vm_object.c:1335
#4  0xc0172a6a in vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
    p=0xca6e5a40, slpflag=256, slptimeo=0) at ../../kern/vfs_subr.c:671
#5  0xc019541c in nfs_vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
    p=0xca6e5a40, intrflg=1) at ../../nfs/nfs_bio.c:978
#6  0xc01b6859 in nfs_open (ap=0xca6f3e2c) at ../../nfs/nfs_vnops.c:490
#7  0xc01796ae in vn_open (ndp=0xca6f3f00, fmode=1, cmode=1512)
    at vnode_if.h:163
#8  0xc01760d9 in open (p=0xca6e5a40, uap=0xca6f3f94)
    at ../../kern/vfs_syscalls.c:935
#9  0xc02108bf in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134725618,
      tf_esi = -1077946896, tf_ebp = -1077946944, tf_isp = -898678812,
      tf_ebx = -1077946956, tf_edx = -1077946588, tf_ecx = 134893176,
      tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672042756, tf_cs = 31,
      tf_eflags = 514, tf_esp = -1077949296, tf_ss = 39})
    at ../../i386/i386/trap.c:1100
#10 0xc01ff11c in Xint0x80_syscall ()
#11 0x8049d39 in ?? ()

-------------------------------------


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001024130047.A37034>