Date: Sat, 5 Mar 2016 16:42:45 +0300 From: Dmitry Sivachenko <trtrmitya@gmail.com> To: Eugene Grosbein <eugen@grosbein.net> Cc: FreeBSD Stable ML <stable@freebsd.org> Subject: Re: nfs_getpages: error 4 Message-ID: <ED06D277-F19B-46F4-BD61-08B6AD10326B@gmail.com> In-Reply-To: <56DAE033.9020304@grosbein.net> References: <A2A32332-4D9D-40DF-9DEC-EE9000879416@gmail.com> <56DACD4E.3070905@grosbein.net> <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com> <56DAE033.9020304@grosbein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 05 Mar 2016, at 16:33, Eugene Grosbein <eugen@grosbein.net> wrote: >=20 > 05.03.2016 19:32, Dmitry Sivachenko =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >=20 >>>> I am running a number of machines with /home mounted via nfs = (FreeBSD 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft). >>>>=20 >>>> Sometimes I get the following messages in syslog: >>>>=20 >>>> nfs_getpages: error 4 >>>> vm_fault: pager read error, pid NNN (myprog) >>>>=20 >>>> After that I see I lot of processes stuck in "pfault" state (these = are computational processes which use some files from NFS mount), they = use 0% of CPU after that. >>>>=20 >>>> On NFS server machine I see nothing strange in logs. procstat -kk = for such stuck processes shows: >>>> PID TID COMM TDNAME KSTACK >>>> 85274 102056 myprog - mi_switch+0xbe = sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 = vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8 >>>>=20 >>>>=20 >>>> What can be the reason of this? >>>=20 >>> For example, if some processes running on NFS server box modify some = files "in-place" >>> and these files are opened by processes running on NFS client, that = could be the reason. >>> If so, change this so processes updating such files create new = temporary versions of them first >>> and then rename them atomically. >>>=20 >>=20 >> This should not be the case: users are working only on NFS clients. >> Moreover, the nature of computations is so that each process uses = it's own set of files. >>=20 >> (Forgot to mention in my previous e-mail that these processes can't = be stopped even with kill -9) >=20 > Make sure you use TCP mounts and TSO is disabled. I do use TCP mount (this is the default). I will try to disable TSO. > Try switching between NFSv3/NFSv4 to avoid this bug As far as I understand, the default is NFSv3 (which should be more = stable?). I can try to switch to NFSv4. > and to discover what version is broken. And show full mount = command/option set. I already included mount flags from fstab in my original e-mail: rw,bg,intr,soft
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ED06D277-F19B-46F4-BD61-08B6AD10326B>