From owner-freebsd-hackers  Mon Mar 22 11:17:16 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id 9FA3B14DA0
	for <freebsd-hackers@FreeBSD.ORG>; Mon, 22 Mar 1999 11:16:29 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id LAA23955;
	Mon, 22 Mar 1999 11:15:57 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 22 Mar 1999 11:15:57 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199903221915.LAA23955@apollo.backplane.com>
To: "David E. Cross" <crossd@cs.rpi.edu>
Cc: freebsd-hackers@FreeBSD.ORG, schimken@cs.rpi.edu
Subject: Re: Death to nfsiod
References:  <199903220041.TAA23505@cs.rpi.edu>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:Recently we have been having the problem of disk wait processes on our
:FreeBSD client machines (served from our FreeBSD servers of the same
:release, 3.1-STABLE).  On the advice of Mike Smith I killed the NFS
:server processes on the FS, and then restarted them, this fixed the
:problem.  We then recompiled all of our server machines with "maxusers 64"
:since that had been an apparent problem on another [remote access] server.
:However, this did not fix the disk wait processes or some other wierdness.
:As a bit of a torture test I used mkisofs to burn a Joliet IE5 cd iamge.
:I tried this test >10 times and every time *failed*.  The failure would
:either be a disk-wait process or a wierd error with the output file (the
:2 errors with the output file were both demonstrated with "ls *.iso", error
:1 was : ie5.iso: protocol error.  error 2 (this was much more common):
:ie5.iso: not a directory), after a couple of seconds the error would go
:away.  Getting to the subject of the message, it has been observed that
:once a single process goes into this disk-wait state it becomes much more
:likely for additional processes to get there.  While running the mkisofs
:one time I noticed that at the same time it went into disk wait a nfsiod
:went into (and remained) in disk wait.  As a test I killed and restarted
:the NFSDs on the server (that woke both the nfsiod and the mkisofs), and
:then killed all nfsiods on the NFS client.  The result is that I have again
:run mkisofs 10 times, now without a single failure or weird behaviour.
:
:--
:David "The one long paragraph" Cross

    If this is 3.1-RELEASE, or if this is a 3.x-STABLE more then a week
    or so old, update to the latest 3.x-STABLE and re-test.  A large
    number of NFS-related bugs were fixed in 3.x the last two weeks.

    Currently there are known but not-yet-tracked-down problems with
    the case where exported files being accessed by clients are 
    modified on the server.  If this is not the case, and you still
    have bugs, this could be something new.

    The only bug I know in regards to nfsd/nfsiod is a performance
    issue with the async daemons queueing I/O for the same vnode
    and different nfsd's picking it up, causing vnode lock
    serialization to occur on the server.  But this wasn't a deadlock
    in the tests I ran.

    If you determine that the bug is still there with the latest
    -stable, I can setup my test box to start doing mkisofs runs.

    Your NFS configuration ( dmesg, df, mkisofs command line you
    are using, mount options if any ) would also be valueable.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message