From owner-freebsd-hackers Mon Mar 22 11:17:16 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 9FA3B14DA0 for ; Mon, 22 Mar 1999 11:16:29 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id LAA23955; Mon, 22 Mar 1999 11:15:57 -0800 (PST) (envelope-from dillon) Date: Mon, 22 Mar 1999 11:15:57 -0800 (PST) From: Matthew Dillon Message-Id: <199903221915.LAA23955@apollo.backplane.com> To: "David E. Cross" Cc: freebsd-hackers@FreeBSD.ORG, schimken@cs.rpi.edu Subject: Re: Death to nfsiod References: <199903220041.TAA23505@cs.rpi.edu> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Recently we have been having the problem of disk wait processes on our :FreeBSD client machines (served from our FreeBSD servers of the same :release, 3.1-STABLE). On the advice of Mike Smith I killed the NFS :server processes on the FS, and then restarted them, this fixed the :problem. We then recompiled all of our server machines with "maxusers 64" :since that had been an apparent problem on another [remote access] server. :However, this did not fix the disk wait processes or some other wierdness. :As a bit of a torture test I used mkisofs to burn a Joliet IE5 cd iamge. :I tried this test >10 times and every time *failed*. The failure would :either be a disk-wait process or a wierd error with the output file (the :2 errors with the output file were both demonstrated with "ls *.iso", error :1 was : ie5.iso: protocol error. error 2 (this was much more common): :ie5.iso: not a directory), after a couple of seconds the error would go :away. Getting to the subject of the message, it has been observed that :once a single process goes into this disk-wait state it becomes much more :likely for additional processes to get there. While running the mkisofs :one time I noticed that at the same time it went into disk wait a nfsiod :went into (and remained) in disk wait. As a test I killed and restarted :the NFSDs on the server (that woke both the nfsiod and the mkisofs), and :then killed all nfsiods on the NFS client. The result is that I have again :run mkisofs 10 times, now without a single failure or weird behaviour. : :-- :David "The one long paragraph" Cross If this is 3.1-RELEASE, or if this is a 3.x-STABLE more then a week or so old, update to the latest 3.x-STABLE and re-test. A large number of NFS-related bugs were fixed in 3.x the last two weeks. Currently there are known but not-yet-tracked-down problems with the case where exported files being accessed by clients are modified on the server. If this is not the case, and you still have bugs, this could be something new. The only bug I know in regards to nfsd/nfsiod is a performance issue with the async daemons queueing I/O for the same vnode and different nfsd's picking it up, causing vnode lock serialization to occur on the server. But this wasn't a deadlock in the tests I ran. If you determine that the bug is still there with the latest -stable, I can setup my test box to start doing mkisofs runs. Your NFS configuration ( dmesg, df, mkisofs command line you are using, mount options if any ) would also be valueable. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message