From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 15:54:07 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38DD41065674 for ; Tue, 27 Jul 2010 15:54:07 +0000 (UTC) (envelope-from kraduk@googlemail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id C0C7C8FC0C for ; Tue, 27 Jul 2010 15:54:06 +0000 (UTC) Received: by fxm13 with SMTP id 13so732719fxm.13 for ; Tue, 27 Jul 2010 08:54:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=wrObfz+kTJ4eg0jG+yJnvmK9b+X3/OR9JUotbo3rDrA=; b=hi5IeyZjJah0s4lsVaneNUh3L3+KgYV/mOsJl0ErlKPlBMmYHKDo4KQTFxd9imiNMv p58y/MXEeAZFQvbExhDBtIkQ5O/+lvTf+h1EfTTOEVz3qRXDUCpTJMDls599Q+2wNI5f 09wTSNSkIy916IMEKt72/lOBcyaZF3lMjF3ks= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=nH5441msTxi91qqN03ocfPATABxRJpJkDkwKP2cWvU8jsM2AUKMB3JdGI1Mv7uxYF1 uSPw+BTgRXbmy9TcCS50DPqzpWZcySXVoIOxU8gL9EQxhJtzYqxJ5199Bmj5OdmfraIQ Zc8gDhMsVLwyXp2MTZuNKevVA4qGVgkqNvid4= MIME-Version: 1.0 Received: by 10.239.188.19 with SMTP id n19mr553838hbh.154.1280244560784; Tue, 27 Jul 2010 08:29:20 -0700 (PDT) Received: by 10.239.160.201 with HTTP; Tue, 27 Jul 2010 08:29:20 -0700 (PDT) Date: Tue, 27 Jul 2010 16:29:20 +0100 Message-ID: From: krad To: freebsd-hackers@freebsd.org, FreeBSD Questions X-Mailman-Approved-At: Tue, 27 Jul 2010 16:13:31 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 15:54:07 -0000 I have a production mail system with an nfs backend. Every now and again we see the nfs die on a particular head end. However it doesn't die across all the nodes. This suggests to me there isnt an issue with the filer itself and the stats from the filer concur with that. The symptoms are lines like this appearing in dmesg nfs server 10.44.17.138:/vol/vol1/mail: not responding nfs server 10.44.17.138:/vol/vol1/mail: is alive again trussing df it seems to hang on getfsstat, this is presumably when it tries the nfs mounts eg __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0) mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1746583552 (0x681ac000) mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1747632128 (0x682ac000) munmap(0x681ac000,344064) = 0 (0x0) getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9) I have played with mount options a fair bit but they dont make much difference. This is what they are set to at present 10.44.17.138:/vol/vol1/mail /mail/0 nfs rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0 When this locking is occuring I find that if I do a show mount or mount 10.44.17.138:/vol/vol1/mail again under another mount point I can access it fine. One thing I have just noticed is that lockd and statd always seem to have died when this happens. Restarting does not help I find all this a bit perplexing. Can anyone offer any help into why this might be happening. I have dtrace compliled into the kernel if that could help with debugging