Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Oct 2001 22:59:18 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Ryan Dooley <dooleyr@missouri.edu>
Cc:        stable@FreeBSD.ORG
Subject:   Re: Uh... server crashes every day :-/ any thoughts?
Message-ID:  <200110130559.f9D5xIP38725@earth.backplane.com>
References:  <5.1.0.14.0.20011011101456.0368e510@marble.sentex.ca> <3BC5DC0C.BFEBB61E@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help


:Here's a few other things I've noticed about my situation...
:
:This looks to be largly NFS... every once and a while, we get
:input/output I/O errors to the NFS volume (gzip'ing files won't, web
:pages come up with document empty errors, scp's off the system don't
:work the first time). Usually, the next access to the file does the
:trick.  Sounds like a cache miss/error.  I've got output below from an
:''nfsstat -s".

    I/O errors can occur if the server or another client modifies the file
    and the original client is caching the original file descriptor or mmap
    rather then re-opening the file, or if the client is caching attributes
    for a long time.  You can adjust the attribute caching using the 
    acregmin, acregmax, acdirmin, and acdirmax mount options.  I recommend
    medium-sized directory attribute caching and fairly short file attribute
    caching.

    If you are modifying a lot of files being accessed by the clients, it may
    be beneficial to rename them instead of deleting or overwriting them. 
    That way the clients will still retain valid handles on the old files
    for a short while.

:I belive the crash from this morning was due to all of our NFS clients
:lighting up at the same time (about 8am when the students came into sit
:down and use the machines in front of them).

    Crashes are bad, but without more information it would be difficult to
    track down.  A kernel core is best, of course, but if it drops into DDB>
    on a panic Perhaps a 'trace' from DDB> will help.  Along with the panic
    message itself.

    Be sure you are running the latest -stable so you get the UPAGES fix.
    You are running so much stuff on the machine that that bug may be a
    contributor.

    I dunno about running 500+ nfsd's... seems like overkill.  It may be
    beneficial to use TCP nfs mounts where possible instead of UDP mounts.
    Using TCP will avoid network storms.

:I also managed to get our NFS server to freeze for about 7 seconds
:running a simple test:
:
:time dd if=/dev/zero of=/nfs/zeros bs=16k count=16384
:
:Some where around 502MB, the system just hung.  I control-C'd the
:process and the system came back (as well as my heart beat :-)

    I dunno about this.  It sounds like it may have simply been syncing
    the disks?  You would need to run more tests.

					-Matt

:
:Ryan
:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200110130559.f9D5xIP38725>