Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 May 1999 11:42:56 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Doug Rabson <dfr@nlsystems.com>, Kevin Day <toasty@home.dragondata.com>, jso@research.att.com
Cc:        hackers@freebsd.org
Subject:   Re: kern/11470: V3 NFS problem (fwd)
Message-ID:  <199905041842.LAA18532@apollo.backplane.com>
References:   <Pine.BSF.4.05.9905040943490.637-100000@herring.nlsystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help
    I'm going to move this to -hackers to allow people to pool their interest
    in regards to fixing the remaining NFS problems.  I am also CCing the
    parties involved.

    In regards to kern/11470.  This bug report is relative to FreeBSD-3.1.
    I haven't had rm -rf problems myself, and there were a huge number of bugs
    fixed in NFS in FreeBSD-stable (3.x) since the 3.1 release.

    I would ask jso@research.att.com to update his 3.1 machines to the latest
    FreeBSD-stable and repeat his tests.

    I will comment on Kevin's bug report below next to his itemized list.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:Did you see this one? It is almost certainly caused by the client being
:confused when we invalidate their directory cookies. The BSD nfs server is
:pretty fascist about cookies and invalidates them too often. Can you think
:of a better scheme than the directory timestamp for cookies?
:
:--
:Doug Rabson				Mail:  dfr@nlsystems.com
:Nonlinear Systems Ltd.			Phone: +44 181 442 9037
:
:
:---------- Forwarded message ----------
:Date: Mon,  3 May 1999 14:23:47 -0700 (PDT)
:From: jso@research.att.com
:To: freebsd-gnats-submit@freebsd.org
:Subject: kern/11470: V3 NFS problem
:
:>Number:         11470
:>Category:       kern
:>Synopsis:       V3 NFS problem
:>Confidential:   no
:>Severity:       critical
:>Priority:       high
:>Responsible:    freebsd-bugs
:>State:          open
:>Quarter:        
:>Keywords:       
:>Date-Required:
:>Class:          sw-bug
:>Submitter-Id:   current-users
:>Arrival-Date:   Mon May  3 14:30:00 PDT 1999
:>Closed-Date:
:>Last-Modified:
:>Originator:     Jerry So
:>Release:        3.1
:>Organization:
:AT&T Labs-Research
:>Environment:
:FreeBSD spaceless 3.1-RELEASE FreeBSD 3.1-RELEASE #0: Tue Apr 20 17:57:03 EDT 1999     root@spaceless:/usr/src/sys/compile/SPACELESS  i386
:
:>Description:
:NFS client is solaris 2.6 or irix 6.4
:NFS server is freebsd 3.1
:
:For example:
:When doing a rm -Rf gcc-2.8.1 on NFS client
:rm: Unable to remove directory gcc-2.8.1/config/i386: File exists
:rm: Unable to remove directory gcc-2.8.1/config/m68k: File exists
:rm: Unable to remove directory gcc-2.8.1/config: File exists
:rm: Unable to remove directory gcc-2.8.1: File exists
:
:resulted.
:
:Only NFS v3 is having problem. Machines with V2 is OK.
:
:>How-To-Repeat:
:
:Repeat any time
:>Fix:
:
:
:>Release-Note:
:>Audit-Trail:
:>Unformatted:
:
:
:To Unsubscribe: send mail to majordomo@FreeBSD.org
:with "unsubscribe freebsd-bugs" in the body of the message
:
:


:From: Kevin Day <toasty@home.dragondata.com>
:
:Ok, I've been playing with your last patches (just before they were
:committed).
:
:I still see at least three outstanding things. :)
:
:
:1) if I 'sysctl -w vfs.nfs.async=1' on the server, the client will
:eventually get deadlocked, with most processes stuck in 'nfsrcvlk' or
:'nfsinval'(i think)

    Yes, I'm sure there are still a couple of lockup situations that
    we need to fix in this area.  I need to know whether this is via
    NFSV2 or NFSV3 and whether this is a UDP or TCP mount.  And, if it is
    a TCP mount, whether the problem occurs with a UDP mount.  A similar
    situation occured with TCP when I was doing makes that turned out to be
    a data corruption bug related to multiple RPC's winding up in the same
    mbuf.

    Note:  If your *SERVER* is not running the latest -current, you have to
    upgrade it.  If your server is running FreeBSD-stable, the TCP fix (which
    is a server-side bug) has NOT yet been committed to FreeBSD-stable.

:2) If I set a cpu time limit for a process, and the executable file is being
:ran over NFS, if it exceeds the CPU limit, i get flooded with "vm_fault: pager
:error"'s

    This is definitely a bug.  I'll bet you are using an 'intr' or 'soft'
    mount, yes?  There are still some serious bugs with 'intr' mounts 
    interacting badly with the VM system, but they should be relatively easy
    to fix.

:3) See PR 7728. NFS server is also a web server, dumping logs into user's
:home directories. Our FTP server is an NFS client. When clients try to
:download their log files, the ftpd process gets stuck (kill -9 won't kill
:it). This also happens when they try to upload over top of a file they just
:viewed on the web server.
:
:Processes seem to get stuck in 'sbwait' (which really doesn't seem like it's
:stuck), or 'nfsrcv'

    What is occuring is that existing VM cache pages are being ripped out from
    under the client and the client is getting confused.  I'll need to work
    up a reliable way to reproduce the problem between a client and server
    in order to squash it.  If someone else can come up with a simple script
    to run on the client & the server that reproduces the problem, we will
    be able to squash it more quickly.

						-Matt

:In all though, thanks a *lot* for your help with NFS. :) It seems much more
:stable now, i'm not afraid to compile things over nfs anymore. :)
:
:Kevin



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905041842.LAA18532>