Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Dec 2020 00:59:58 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com>
References:  <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J. David wrote:=0A=
>Recently, we attempted to get with the 2000's and try switching from=0A=
>NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.=0A=
>=0A=
>Any system we switch to NFSv4 mounts is functionally unusable, pegged=0A=
>at 100% system CPU usage, load average 70+, largely from nfscl threads=0A=
>and client processes using NFS.=0A=
>=0A=
>Dmesg shows NFS-related messages:=0A=
>=0A=
>$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n=0A=
>   1 nfsv4 err=3D10010=0A=
>   4 nfsv4 client/server protocol prob err=3D10026=0A=
>  29 nfscl: never fnd open=0A=
Add "minorversion=3D1" to your FreeBSD NFS client mount options=0A=
and error 10026 should go away (and I suspect that the 10010 will=0A=
go away too.=0A=
=0A=
The correct semantics for handling the "seqid" field that=0A=
serialized open/lock operations for NFSv4.0 is difficult to get=0A=
correct (and might now be broken in the client, since the=0A=
original code written 20years ago depended on exclusive=0A=
vnode locking and hasn't been updated or interop tested with=0A=
non-FreeBSD NFS servers for ages).=0A=
--> NFSv4.0 is close to 20years old and has been fixed/superceded=0A=
      by NFSv4.1 for many years now.=0A=
--> NFSv4.1 (and NFSv4.2) replaced the "seqid" stuff with something=0A=
      called "sessions", which works better.=0A=
=0A=
I have been tempted to make FreeBSD NFSv4 mounts use 4.1/4.2=0A=
by default to avoid problems with NFSv4.0, but I've hesitated since=0A=
the change could be considered a POLA violation.=0A=
=0A=
NFSv4.0 is like any .0 release. There were significant issues with the=0A=
protocol fixed by NFSv4.1.=0A=
=0A=
If you still have problems when using NFSv4.1, post again.=0A=
Btw, "nfsstat -m" shows what the client mount options actually are.=0A=
=0A=
rick=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"=0A=
both report:=0A=
=0A=
 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
      0      0      0      0      0      0      0      0=0A=
=0A=
Meanwhile, tcpdump on the client shows an endless stream of getattr=0A=
requests at the exact same time nfsstat -c says nothing is happening:=0A=
=0A=
$ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39=0A=
14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],=0A=
ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],=0A=
length 0=0A=
14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val=0A=
234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148=0A=
getattr fh 0,5/0=0A=
14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val=0A=
234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152=0A=
getattr fh 0,5/0=0A=
14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val=0A=
234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152=0A=
getattr fh 0,5/0=0A=
14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val=0A=
234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148=0A=
getattr fh 0,5/0=0A=
14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val=0A=
234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152=0A=
getattr fh 0,5/0=0A=
14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val=0A=
234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152=0A=
getattr fh 0,5/0=0A=
14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val=0A=
234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152=0A=
getattr fh 0,5/0=0A=
14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val=0A=
234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152=0A=
getattr fh 0,5/0=0A=
14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A=
seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val=0A=
234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152=0A=
getattr fh 0,5/0=0A=
=0A=
Only 10 shown here for brevity, but:=0A=
=0A=
$ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |=0A=
fgrep getattr | wc -l=0A=
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode=
=0A=
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes=
=0A=
10000 packets captured=0A=
20060 packets received by filter=0A=
0 packets dropped by kernel=0A=
    9759=0A=
=0A=
There are no dropped packets or network problems:=0A=
=0A=
$ netstat -in -I net1=0A=
Name    Mtu Network       Address              Ipkts Ierrs Idrop=0A=
Opkts Oerrs  Coll=0A=
net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0=0A=
48760307     0     0=0A=
net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -=0A=
48756241     -     -=0A=
=0A=
The mount flags in fstab are:=0A=
=0A=
ro,nfsv4,nosuid=0A=
=0A=
The mount flags as reported by "nfsstat -m" are:=0A=
=0A=
nfsv4,minorversion=3D0,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirma=
x=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negnametimeo=3D60,rsize=3D=
65536,wsize=3D65536,readdirsize=3D65536,readahead=3D1,wcommitsize=3D1677721=
6,timeout=3D120,retrans=3D2147483647=0A=
=0A=
Today, I managed to kill everything down to one user process that was=0A=
exhibiting this behavior.  After a kill -9 on that process, it went to=0A=
"REsJ" but continued to burn the same amount of CPU (all system).=0A=
Oddly the run state / wait channel was just "CPU1."  Running "ktrace"=0A=
did not produce any trace records.  Probably that is predictable for a=0A=
process in E state; if the process had crossed the user/kernel=0A=
boundary in a way ktrace could detect, it would have exited.=0A=
=0A=
At that point, I started unmounting filesystems.  Everything but the=0A=
NFS filesystem used by that process unmounted cleanly.  The umount for=0A=
that filesystem went to D state for about a minute and then kicked=0A=
back "Device busy."  That's fair, if awfully slow.=0A=
=0A=
Meanwhile, that user process continued burning system CPU with the E=0A=
flag set, not doing anything whatsoever in userspace, still producing=0A=
300+ "getattr fh 0,5/0" per second according to tcpdump and 0=0A=
according to nfsstat.=0A=
=0A=
Eventually, I rebooted with fstab set back to nfsv3.=0A=
=0A=
This feels like the user process is in a system call that is stuck in=0A=
an endless loop repeating some operation that generates that getattr=0A=
request.  But that is a feeling, not a fact.=0A=
=0A=
This is fairly easy to reproduce; it seems pretty consistent within a=0A=
few hours (a day at most) any time I switch the relevant mounts to=0A=
nfsv4.  Reverting to nfsv3 makes this issue completely disappear.=0A=
=0A=
What on earth could be going on here?  What other information can I=0A=
provide that would help track this down?=0A=
=0A=
Thanks for any advice!=0A=
_______________________________________________=0A=
freebsd-fs@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A=
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096849ADF24051F7479E565CDDCA0>