From owner-freebsd-stable@FreeBSD.ORG Thu Aug 12 17:50:29 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76D811065696 for ; Thu, 12 Aug 2010 17:50:29 +0000 (UTC) (envelope-from mark@islandnet.com) Received: from cluster2.islandnet.com (cluster.islandnet.com [199.175.106.51]) by mx1.freebsd.org (Postfix) with ESMTP id 5BCFE8FC14 for ; Thu, 12 Aug 2010 17:50:29 +0000 (UTC) Received: from [199.175.106.221] (port=13037 helo=helpdesk.islandnet.com) by blade2.islandnet.com with SMTP id 1Ojbgc-000E1e-9G for freebsd-stable@freebsd.org; Thu, 12 Aug 2010 10:35:26 -0700 From: Mark Morley To: FreeBSD Stable Date: Thu, 12 Aug 2010 10:35:49 -0700 X-Priority: 3 X-Mailer: Islandnet.com Helpdesk Webmail MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8 Content-Transfer-Encoding: 8bit Message-Id: <20100812175029.76D811065696@hub.freebsd.org> Subject: NFS stalling on 8.1-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2010 17:50:29 -0000 Hi all, I have five front end web servers that all mount their content from the same server via NFS. If I stress the link on any one of the machines (eg: copy a large directory with a lot of files to/from the mounted file system) the client will pause. That is, all processes trying to access that mount will freeze. The log files with hundreds or thousands of nfs server not responding / is alive again messages. After 60 seconds it returns to normal, unless the load is still there in which case it continues to pause. This has only started happening since I upgraded the client machines to 8.1-STABLE (previously four of them were 8.0 and one was 7.3). The server is 7.1-RELEASE-p11. No other changes have taken place in terms of hardware or software or mount options, etc. All nics involved are gigabit em cards, and they are on a private network (web access to the boxes is via an external interface). If I truss a command such as "df", it gets to getfsstat() and pauses there. Mount options are currently "rw,tcp,nolockd,noatime,nosuid,bg,intr,soft,rsize=32768,wsize=32768" but I've tried all sorts of things and it doesn't seem to make a difference. Here's a sample output from nfsstat -c from one of the boxes (uptime 14 days): Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 75552107 3008653 300569929 253365 2426554 4748471 2035545 3015497 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 864598 50887 7462 11895 1137933 16160386 0 31593291 Mknod Fsstat Fsinfo PathConf Commit 0 22510271 5 0 3569465 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 467516377 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 1461457650 75552057 963440449 300536041 37404178 2359677 9467719 4748471 BioRLHits Misses BioD Hits Misses DirE Hits Misses 14409992 253365 29508747 16119060 22292421 23233 Any thoughts? Mark