From owner-freebsd-stable@FreeBSD.ORG Mon Sep 15 20:49:43 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF03010656A8 for ; Mon, 15 Sep 2008 20:49:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2A46F8FC14 for ; Mon, 15 Sep 2008 20:49:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m8FKnSXB083294; Mon, 15 Sep 2008 16:49:36 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Mon, 15 Sep 2008 16:06:23 -0400 User-Agent: KMail/1.9.7 References: <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> In-Reply-To: <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809151606.23933.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Mon, 15 Sep 2008 16:49:36 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8250/Mon Sep 15 14:08:28 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Tim Chen Subject: Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Sep 2008 20:49:43 -0000 On Monday 15 September 2008 11:57:02 am Tim Chen wrote: > Currently I was running a mail server using a netapp filer as backend > storage. > >From time to time, the whole system get stuck and lasted for 3-5 minutes. > But > after that, everything recovers normally. During the "stuck" moment, using > ps > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D" > status. > The command df certainly does not reponse either. Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck threads when they hang? If it is "lockf", then make sure you have an up-to-date RELENG_6 kernel as there was a recent fix for a "lockf" hang. Alternatively, if things are stuck in "nfsreq", it may be useful to use tcpdump to look at the NFS requests your client is making. nfsstat can also be useful as you can see which counters are increasing during a hang. -- John Baldwin