From owner-freebsd-stable@freebsd.org Sat Mar 5 13:33:59 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 440F9A0A0A5 for ; Sat, 5 Mar 2016 13:33:59 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 2C54F20B for ; Sat, 5 Mar 2016 13:33:59 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: by mailman.ysv.freebsd.org (Postfix) id 2B824A0A0A4; Sat, 5 Mar 2016 13:33:59 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B11FA0A0A3 for ; Sat, 5 Mar 2016 13:33:59 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A333A20A for ; Sat, 5 Mar 2016 13:33:57 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.14.9/8.14.9) with ESMTP id u25DXrAN077055 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 5 Mar 2016 14:33:54 +0100 (CET) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: trtrmitya@gmail.com Received: from [10.58.0.10] (dadvw [10.58.0.10]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id u25DXiCY006384 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 5 Mar 2016 20:33:44 +0700 (KRAT) (envelope-from eugen@grosbein.net) Subject: Re: nfs_getpages: error 4 To: Dmitry Sivachenko References: <56DACD4E.3070905@grosbein.net> <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com> Cc: FreeBSD Stable ML From: Eugene Grosbein Message-ID: <56DAE033.9020304@grosbein.net> Date: Sat, 5 Mar 2016 20:33:39 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM autolearn=no version=3.3.2 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hz.grosbein.net X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Mar 2016 13:33:59 -0000 05.03.2016 19:32, Dmitry Sivachenko пишет: >>> I am running a number of machines with /home mounted via nfs (FreeBSD 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft). >>> >>> Sometimes I get the following messages in syslog: >>> >>> nfs_getpages: error 4 >>> vm_fault: pager read error, pid NNN (myprog) >>> >>> After that I see I lot of processes stuck in "pfault" state (these are computational processes which use some files from NFS mount), they use 0% of CPU after that. >>> >>> On NFS server machine I see nothing strange in logs. procstat -kk for such stuck processes shows: >>> PID TID COMM TDNAME KSTACK >>> 85274 102056 myprog - mi_switch+0xbe sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8 >>> >>> >>> What can be the reason of this? >> >> For example, if some processes running on NFS server box modify some files "in-place" >> and these files are opened by processes running on NFS client, that could be the reason. >> If so, change this so processes updating such files create new temporary versions of them first >> and then rename them atomically. >> > > This should not be the case: users are working only on NFS clients. > Moreover, the nature of computations is so that each process uses it's own set of files. > > (Forgot to mention in my previous e-mail that these processes can't be stopped even with kill -9) Make sure you use TCP mounts and TSO is disabled. Try switching between NFSv3/NFSv4 to avoid this bug and to discover what version is broken. And show full mount command/option set.