FreeBSD Mail Archives

Date:      Sat, 5 Mar 2016 16:42:45 +0300
From:      Dmitry Sivachenko <trtrmitya@gmail.com>
To:        Eugene Grosbein <eugen@grosbein.net>
Cc:        FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: nfs_getpages: error 4
Message-ID:  <ED06D277-F19B-46F4-BD61-08B6AD10326B@gmail.com>
In-Reply-To: <56DAE033.9020304@grosbein.net>
References:  <A2A32332-4D9D-40DF-9DEC-EE9000879416@gmail.com> <56DACD4E.3070905@grosbein.net> <550ADE4F-9F60-44FB-BF07-A1384A6B7B1A@gmail.com> <56DAE033.9020304@grosbein.net>


> On 05 Mar 2016, at 16:33, Eugene Grosbein <eugen@grosbein.net> wrote:
>=20
> 05.03.2016 19:32, Dmitry Sivachenko =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
>=20
>>>> I am running a number of machines with /home mounted via nfs =
(FreeBSD 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>>>>=20
>>>> Sometimes I get the following messages in syslog:
>>>>=20
>>>> nfs_getpages: error 4
>>>> vm_fault: pager read error, pid NNN (myprog)
>>>>=20
>>>> After that I see I lot of processes stuck in "pfault" state (these =
are computational processes which use some files from NFS mount), they =
use 0% of CPU after that.
>>>>=20
>>>> On NFS server machine I see nothing strange in logs.  procstat -kk =
for such stuck processes shows:
>>>>  PID    TID COMM             TDNAME           KSTACK
>>>> 85274 102056 myprog           -                mi_switch+0xbe =
sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 =
vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>>>>=20
>>>>=20
>>>> What can be the reason of this?
>>>=20
>>> For example, if some processes running on NFS server box modify some =
files "in-place"
>>> and these files are opened by processes running on NFS client, that =
could be the reason.
>>> If so, change this so processes updating such files create new =
temporary versions of them first
>>> and then rename them atomically.
>>>=20
>>=20
>> This should not be the case: users are working only on NFS clients.
>> Moreover, the nature of computations is so that each process uses =
it's own set of files.
>>=20
>> (Forgot to mention in my previous e-mail that these processes can't =
be stopped even with kill -9)
>=20
> Make sure you use TCP mounts and TSO is disabled.


I do use TCP mount (this is the default).  I will try to disable TSO.


> Try switching between NFSv3/NFSv4 to avoid this bug

As far as I understand, the default is NFSv3 (which should be more =
stable?).

I can try to switch to NFSv4.


> and to discover what version is broken. And show full mount =
command/option set.


I already included mount flags from fstab in my original e-mail:

rw,bg,intr,soft

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ED06D277-F19B-46F4-BD61-08B6AD10326B>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation