Date: Fri, 11 Dec 2020 00:59:58 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> References: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
J. David wrote:=0A= >Recently, we attempted to get with the 2000's and try switching from=0A= >NFSv3 to NFSv4 on our 12.2 servers. This has not gone well.=0A= >=0A= >Any system we switch to NFSv4 mounts is functionally unusable, pegged=0A= >at 100% system CPU usage, load average 70+, largely from nfscl threads=0A= >and client processes using NFS.=0A= >=0A= >Dmesg shows NFS-related messages:=0A= >=0A= >$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n=0A= > 1 nfsv4 err=3D10010=0A= > 4 nfsv4 client/server protocol prob err=3D10026=0A= > 29 nfscl: never fnd open=0A= Add "minorversion=3D1" to your FreeBSD NFS client mount options=0A= and error 10026 should go away (and I suspect that the 10010 will=0A= go away too.=0A= =0A= The correct semantics for handling the "seqid" field that=0A= serialized open/lock operations for NFSv4.0 is difficult to get=0A= correct (and might now be broken in the client, since the=0A= original code written 20years ago depended on exclusive=0A= vnode locking and hasn't been updated or interop tested with=0A= non-FreeBSD NFS servers for ages).=0A= --> NFSv4.0 is close to 20years old and has been fixed/superceded=0A= by NFSv4.1 for many years now.=0A= --> NFSv4.1 (and NFSv4.2) replaced the "seqid" stuff with something=0A= called "sessions", which works better.=0A= =0A= I have been tempted to make FreeBSD NFSv4 mounts use 4.1/4.2=0A= by default to avoid problems with NFSv4.0, but I've hesitated since=0A= the change could be considered a POLA violation.=0A= =0A= NFSv4.0 is like any .0 release. There were significant issues with the=0A= protocol fixed by NFSv4.1.=0A= =0A= If you still have problems when using NFSv4.1, post again.=0A= Btw, "nfsstat -m" shows what the client mount options actually are.=0A= =0A= rick=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"=0A= both report:=0A= =0A= GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= 0 0 0 0 0 0 0 0=0A= =0A= Meanwhile, tcpdump on the client shows an endless stream of getattr=0A= requests at the exact same time nfsstat -c says nothing is happening:=0A= =0A= $ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39=0A= 14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],=0A= ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],=0A= length 0=0A= 14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val=0A= 234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148=0A= getattr fh 0,5/0=0A= 14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val=0A= 234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152=0A= getattr fh 0,5/0=0A= 14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val=0A= 234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152=0A= getattr fh 0,5/0=0A= 14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val=0A= 234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148=0A= getattr fh 0,5/0=0A= 14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val=0A= 234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152=0A= getattr fh 0,5/0=0A= 14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val=0A= 234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152=0A= getattr fh 0,5/0=0A= 14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val=0A= 234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152=0A= getattr fh 0,5/0=0A= 14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val=0A= 234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152=0A= getattr fh 0,5/0=0A= 14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],=0A= seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val=0A= 234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152=0A= getattr fh 0,5/0=0A= =0A= Only 10 shown here for brevity, but:=0A= =0A= $ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |=0A= fgrep getattr | wc -l=0A= tcpdump: verbose output suppressed, use -v or -vv for full protocol decode= =0A= listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes= =0A= 10000 packets captured=0A= 20060 packets received by filter=0A= 0 packets dropped by kernel=0A= 9759=0A= =0A= There are no dropped packets or network problems:=0A= =0A= $ netstat -in -I net1=0A= Name Mtu Network Address Ipkts Ierrs Idrop=0A= Opkts Oerrs Coll=0A= net1 1500 <Link#2> 12:33:df:5f:79:d7 40988832 0 0=0A= 48760307 0 0=0A= net1 - 172.20.0.0/16 172.20.200.39 40942065 - -=0A= 48756241 - -=0A= =0A= The mount flags in fstab are:=0A= =0A= ro,nfsv4,nosuid=0A= =0A= The mount flags as reported by "nfsstat -m" are:=0A= =0A= nfsv4,minorversion=3D0,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirma= x=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negnametimeo=3D60,rsize=3D= 65536,wsize=3D65536,readdirsize=3D65536,readahead=3D1,wcommitsize=3D1677721= 6,timeout=3D120,retrans=3D2147483647=0A= =0A= Today, I managed to kill everything down to one user process that was=0A= exhibiting this behavior. After a kill -9 on that process, it went to=0A= "REsJ" but continued to burn the same amount of CPU (all system).=0A= Oddly the run state / wait channel was just "CPU1." Running "ktrace"=0A= did not produce any trace records. Probably that is predictable for a=0A= process in E state; if the process had crossed the user/kernel=0A= boundary in a way ktrace could detect, it would have exited.=0A= =0A= At that point, I started unmounting filesystems. Everything but the=0A= NFS filesystem used by that process unmounted cleanly. The umount for=0A= that filesystem went to D state for about a minute and then kicked=0A= back "Device busy." That's fair, if awfully slow.=0A= =0A= Meanwhile, that user process continued burning system CPU with the E=0A= flag set, not doing anything whatsoever in userspace, still producing=0A= 300+ "getattr fh 0,5/0" per second according to tcpdump and 0=0A= according to nfsstat.=0A= =0A= Eventually, I rebooted with fstab set back to nfsv3.=0A= =0A= This feels like the user process is in a system call that is stuck in=0A= an endless loop repeating some operation that generates that getattr=0A= request. But that is a feeling, not a fact.=0A= =0A= This is fairly easy to reproduce; it seems pretty consistent within a=0A= few hours (a day at most) any time I switch the relevant mounts to=0A= nfsv4. Reverting to nfsv3 makes this issue completely disappear.=0A= =0A= What on earth could be going on here? What other information can I=0A= provide that would help track this down?=0A= =0A= Thanks for any advice!=0A= _______________________________________________=0A= freebsd-fs@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A= To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096849ADF24051F7479E565CDDCA0>