Date: Fri, 12 Oct 2001 22:59:18 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Ryan Dooley <dooleyr@missouri.edu> Cc: stable@FreeBSD.ORG Subject: Re: Uh... server crashes every day :-/ any thoughts? Message-ID: <200110130559.f9D5xIP38725@earth.backplane.com> References: <5.1.0.14.0.20011011101456.0368e510@marble.sentex.ca> <3BC5DC0C.BFEBB61E@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
:Here's a few other things I've noticed about my situation...
:
:This looks to be largly NFS... every once and a while, we get
:input/output I/O errors to the NFS volume (gzip'ing files won't, web
:pages come up with document empty errors, scp's off the system don't
:work the first time). Usually, the next access to the file does the
:trick. Sounds like a cache miss/error. I've got output below from an
:''nfsstat -s".
I/O errors can occur if the server or another client modifies the file
and the original client is caching the original file descriptor or mmap
rather then re-opening the file, or if the client is caching attributes
for a long time. You can adjust the attribute caching using the
acregmin, acregmax, acdirmin, and acdirmax mount options. I recommend
medium-sized directory attribute caching and fairly short file attribute
caching.
If you are modifying a lot of files being accessed by the clients, it may
be beneficial to rename them instead of deleting or overwriting them.
That way the clients will still retain valid handles on the old files
for a short while.
:I belive the crash from this morning was due to all of our NFS clients
:lighting up at the same time (about 8am when the students came into sit
:down and use the machines in front of them).
Crashes are bad, but without more information it would be difficult to
track down. A kernel core is best, of course, but if it drops into DDB>
on a panic Perhaps a 'trace' from DDB> will help. Along with the panic
message itself.
Be sure you are running the latest -stable so you get the UPAGES fix.
You are running so much stuff on the machine that that bug may be a
contributor.
I dunno about running 500+ nfsd's... seems like overkill. It may be
beneficial to use TCP nfs mounts where possible instead of UDP mounts.
Using TCP will avoid network storms.
:I also managed to get our NFS server to freeze for about 7 seconds
:running a simple test:
:
:time dd if=/dev/zero of=/nfs/zeros bs=16k count=16384
:
:Some where around 502MB, the system just hung. I control-C'd the
:process and the system came back (as well as my heart beat :-)
I dunno about this. It sounds like it may have simply been syncing
the disks? You would need to run more tests.
-Matt
:
:Ryan
:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200110130559.f9D5xIP38725>
