From owner-freebsd-current  Fri Feb  5 12:44:05 1999
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA27903
          for freebsd-current-outgoing; Fri, 5 Feb 1999 12:44:05 -0800 (PST)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA27888
          for <current@freebsd.org>; Fri, 5 Feb 1999 12:44:03 -0800 (PST)
          (envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.2/8.9.1) id LAA98698;
	Fri, 5 Feb 1999 11:47:05 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 5 Feb 1999 11:47:05 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199902051947.LAA98698@apollo.backplane.com>
To: current@FreeBSD.ORG
Subject: Seeing NFS saturation 'loop' when installworld'ing to NFS / and /usr
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

    This is very odd.  This is the approximate backtrace that I get
    when I throw my test machine into DDB:

    --- interrupt, ...
    nfs_* routines....
    cluster_wbuild
    vfs_bio_awrite
    flushdirtybuffers
    bdwrite
    nfs_write
    vn_write
    write
    syscall

    What is happening is that I am doing a 'make installworld' on my
    test machine with / and /usr NFS V3 mounted R+W.  

    The install goes well, but four times so far an 'install' command has
    gotten 'stuck' in 'R'un state and the network has gone into saturation.
    It seems to be repreating the same NFS I/O over the network over and
    over again as far as I can tell.  It is very odd.

    The network stays in saturation ( at 8 MBytes/sec ) until I kill -STOP
    the install.  It usually takes about a minute for the kill -STOP to take
    effect ( which is also very odd ).  If I then kill -CONT the install
    program, the install resumes normally, finishes its write(), and
    continues on normally.

    The system is not locked up when this situation occurs.

    As far as I can tell, it is some sort of weird interaction with the
    flushdirtybuffers() routine and cluster_wbuild(), but I haven't a clue
    as to what is causing the problem.

    Sometimes when I break into DDB and do a backtrace, flushdirtybuffers()
    is being called from a non-NFS element such as ffs_update(), so the
    sequence looks like:

    nfs_* routines....
    cluster_wbuild
    vfs_bio_awrite
    flushdirtybuffers
    bdwrite	( I think )
    ffs_update

    It is very odd.  I don't suppose very many people try to make install
    over NFS ( it works, you just have to chflags -R noschg the destination
    on the NFS server before you run make install on the NFS client ).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message