Date: Thu, 12 Feb 1998 16:59:28 -0800 (PST) From: gallatin@cs.duke.edu To: freebsd-gnats-submit@FreeBSD.ORG Subject: kern/5731: executables wedge on "vmopar" when built in fs mounted via NFSv3 from DU4.0B Message-ID: <199802130059.QAA13298@hub.freebsd.org>
index | next in thread | raw e-mail
>Number: 5731
>Category: kern
>Synopsis: executables wedge on "vmopar" when built in fs mounted via NFSv3 from DU4.0B
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Thu Feb 12 17:00:01 PST 1998
>Last-Modified:
>Originator: Andrew Gallatin
>Organization:
Duke University, Department of Computer Science
>Release: 2.2.5-STABLE
>Environment:
FreeBSD rain.cs.duke.edu 2.2.5-STABLE FreeBSD 2.2.5-STABLE #11: Thu Feb 12 18:51:05 EST 1998 gallatin@treefrog.cs.duke.edu:/usr/project/ari_scratch2/gallatin/freebsd-compiles/compile/TPZ i386
>Description:
If you run certain executables immediately after writing them to a
partition mounted via NFSv3 from a Digital UNIX (4.0B) NFS server, they
sleep infinitely on "vmopar". Typically this occurs when one executes
a large program immediately after linking it. Here is a stack trace
of a wedged job:
# gdb -k kernel /dev/mem
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd),
Copyright 1996 Free Software Foundation, Inc...
IdlePTD 295000
current pcb at 7257000
#0 mi_switch () at ../../kern/kern_synch.c:628
628 microtime(&runtime);
(kgdb) proc pidhashtbl[220]->lh_first
current pcb at f5a41000
(kgdb) where
#0 mi_switch () at ../../kern/kern_synch.c:628
#1 0xf011f3b5 in tsleep (ident=0xf041cb50, priority=4,
wmesg=0xf01b98d4 "vmopar", timo=0) at ../../kern/kern_synch.c:391
#2 0xf01b9a9c in vm_object_page_remove (object=0xf1903680, start=0, end=1540,
clean_only=1) at ../../vm/vm_object.c:1261
#3 0xf013a090 in vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00,
p=0xf18b2200, slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:540
#4 0xf015e278 in nfs_vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00,
p=0xf18b2200, intrflg=1) at ../../nfs/nfs_bio.c:799
#5 0xf015cd60 in nfs_bioread (vp=0xf1903700, uio=0xefbffe48, ioflag=8,
cred=0xf18f6d00, getpages=1) at ../../nfs/nfs_bio.c:213
#6 0xf015ca98 in nfs_getpages (ap=0xefbffe84) at ../../nfs/nfs_bio.c:130
#7 0xf01beaa8 in vnode_pager_getpages (object=0xf1903680, m=0xefbfff3c,
count=2, reqpage=0) at vnode_if.h:1063
#8 0xf01bd657 in vm_pager_get_pages (object=0xf1903680, m=0xefbfff3c,
count=2, reqpage=0) at ../../vm/vm_pager.c:188
#9 0xf01b32f6 in vm_fault (map=0xf18fe900, vaddr=6303744,
fault_type=3 '\003', change_wiring=0) at ../../vm/vm_fault.c:426
#10 0xf01ccdcc in trap_pfault (frame=0xefbfffbc, usermode=1)
at ../../i386/i386/trap.c:633
#11 0xf01cc95b in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = 0,
tf_esi = -272640436, tf_ebp = -272640440, tf_isp = -272629788,
tf_ebx = -272640432, tf_edx = -272640424, tf_ecx = 0, tf_eax = 0,
tf_trapno = 12, tf_err = 6, tf_eip = 4168, tf_cs = 31,
tf_eflags = 66054, tf_esp = -272640452, tf_ss = 39})
at ../../i386/i386/trap.c:239
#12 0x1048 in ?? ()
The page in question has its state set to p->busy++ and p->flags &=
~PG_BUSY by nfs_getpages() at frame #6. This state causes the
vm_object_page_remove to sleep, giving a deadlock since nfs_getpages()
can't clear it.
This path is taken in nfs_bioread() because the nfsnode's n_mtime is
not equal to vattr.va_mtime.tv_sec. I suspect that what's happening
is that a write is in progress (the file was just closed by the
linker), and the nfsnode's n_mtime hasn't yet been updated. It
appears Digital UNIX is replying to the read's getattr() before the
write's setattr(), so the nfsnode's n_mtime is != to the value
returned by the getattr().
There is a tcpdump of the transactions (started immediately after the
link, and before the execution) between the server ("storm") and the
client ("rain") at ftp://ftp.cs.duke.edu/pub/gallatin/nfs-bug/log.gz
>How-To-Repeat:
To repeat the problem, compile and link the example program at
ftp://ftp.cs.duke.edu/pub/gallatin/nfs-bug/example.tar.gz in a
partition NFSv3 mounted from a DU4.0B server.
>Fix:
I don't know enough about the NFSv3 spec to really fix this, but a
workaround which appears to work here is to be less aggressive, and
force buffers to be committed on close:
*** /usr/project/spider1/FreeBSD-2.2-STABLE/src/sys/nfs/nfs_vnops.c Wed May 28 14:26:45 1997
--- nfs/nfs_vnops.c Thu Feb 12 18:50:01 1998
***************
*** 595,601 ****
if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NQNFS) == 0 &&
(np->n_flag & NMODIFIED)) {
if (NFS_ISV3(vp)) {
! error = nfs_flush(vp, ap->a_cred, MNT_WAIT, ap->a_p, 0);
np->n_flag &= ~NMODIFIED;
} else
error = nfs_vinvalbuf(vp, V_SAVE, ap->a_cred, ap->a_p, 1);
--- 595,601 ----
if ((VFSTONFS(vp->v_mount)->nm_flag & NFSMNT_NQNFS) == 0 &&
(np->n_flag & NMODIFIED)) {
if (NFS_ISV3(vp)) {
! error = nfs_flush(vp, ap->a_cred, MNT_WAIT, ap->a_p, 1);
np->n_flag &= ~NMODIFIED;
} else
error = nfs_vinvalbuf(vp, V_SAVE, ap->a_cred, ap->a_p, 1);
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199802130059.QAA13298>
