From owner-freebsd-hackers@FreeBSD.ORG Sun Aug 1 23:11:58 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B11E1065674; Sun, 1 Aug 2010 23:11:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 214A28FC1A; Sun, 1 Aug 2010 23:11:57 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAB6cVUyDaFvO/2dsb2JhbACDE51zrjuQVYEmgU2BU3MEiH8 X-IronPort-AV: E=Sophos;i="4.55,299,1278302400"; d="scan'208";a="89147536" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 Aug 2010 19:11:57 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 17E46B3EA3; Sun, 1 Aug 2010 19:11:57 -0400 (EDT) Date: Sun, 1 Aug 2010 19:11:57 -0400 (EDT) From: Rick Macklem To: "Sam Fourman Jr." Message-ID: <1754387280.221911.1280704316965.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [24.65.230.102] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL4_64) Cc: freebsd-hackers@freebsd.org, krad , FreeBSD Questions Subject: Re: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Aug 2010 23:11:58 -0000 > From: "Sam Fourman > On Tue, Jul 27, 2010 at 10:29 AM, krad wrote: > > I have a production mail system with an nfs backend. Every now and > > again we > > see the nfs die on a particular head end. However it doesn't die > > across all > > the nodes. This suggests to me there isnt an issue with the filer > > itself and > > the stats from the filer concur with that. > > > > The symptoms are lines like this appearing in dmesg > > > > nfs server 10.44.17.138:/vol/vol1/mail: not responding > > nfs server 10.44.17.138:/vol/vol1/mail: is alive again > > > > trussing df it seems to hang on getfsstat, this is presumably when > > it tries > > the nfs mounts > > > > I also have this problem, where nfs locks up on a FreeBSD 9 server > and a FreeBSD RELENG_8 client > If by RELENG_8, you mean 8.0 (or pre-8.1), there are a number of patches for the client side krpc. They can be found at: http://people.freebsd.org/~rmacklem/freebsd8.0-patches (These are all in FreeBSD8.1, so ignore this if your client is already running FreeBSD8.1.) rick ps: "lock up" can mean many things. The more specific you can be w.r.t. the behaviour, the more likely it can be resolved. For example: - No more access to the subtree under the mount point is possible until the client is rebooted. When a "ps axlH" one process that was accessing a file in the mount point is shown with WCHAN rpclock and STAT DL. vs - All access to the mount point stops for about 1minute and then recovers. Also, showing what mount options are being used by the client and whether or not rpc.lockd and rpc.statd are running can also be useful. And if you can look at the net ttraffic with wireshark when it is locked up and see if any NFS traffic is happening can also be useful.