Date: Tue, 27 Feb 2018 22:54:01 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Ruben <mail@osfux.nl>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Cc: "rmacklem@FreeBSD.org" <rmacklem@FreeBSD.org> Subject: Re: Linux NFSv4 clients: bad sequence-id errors. Message-ID: <YQBPR0101MB104253B0A8AC0693DD7D6A2FDDC00@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <ad9a60d5-f843-d915-4e91-133705e276e3@osfux.nl> References: <ad9a60d5-f843-d915-4e91-133705e276e3@osfux.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
Ruben wrote: >I'm experiencing a strange issue on a machine providing a couple of >nfsv4 exports. A Linux client that generates a lot of traffic to and >from the nfs server sometimes starts throwing "bad sequence-id errors": > >Feb 27 10:39:42 localhost kernel: [12481477.608103] NFS: v4 server >returned a bad sequence-id error on an unconfirmed sequence 80f7d0d0! The handling of sequence-id in NFSv4.0 is complex and I won't even try to guess why this is happening. I am surprised that your Linux mounts are using NFSv4.0 and not NFSv4.1? (Usually Linux uses the most recent version supported by the server.) I mention this since "sessions" replaced the sequence-id stuff in NFSv4.1 and, as such, shouldn't have such an issue. >They typically occur after a couple of months of uptime on the nfsd >machine. Every couple of seconds they are thrown by the client. The >situation is "remedied" by restarting the nfsd on the server. Although >functionality on the specific client does not appear to be affected >(much?), its a bit disturbing. I've done some digging and found : The fact that this is fixed by restarting the nfsd suggests a client side problem. Why? Because restarting the nfsd does not reset any server state, so the sequenc= e-id situation would not be affected by doing this. (To get rid of server side s= tate, you must unload the nfsd.ko after killing off the nfsd daemon.) All restarting the nfsd daemon will do is force the client to establish a n= ew TCP connection. That is at a layer below the NFS state. >https://lists.freebsd.org/pipermail/freebsd-fs/2015-July/021707.html > >and the patch attached by Rick ( nfsv41exch.patch : >http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20150729/586f776= a/attachment.bin >) . > >Since the issue started manifesting itself I have restarted the nfs >daemon (grabbed a pcap and the corresponding error lines mentioning the >sequences prior to doing that in case anyone is interested). If you email me the pcap as an attachment, I can take a look at it in wires= hark. >The nfs server runs FreeBSD 11.1 : I'm being lazy and not looking, but I am almost sure a 2015 patch will be i= n 11.1 and probably also in 10.2 and 11.0. >freebsd-version -uk >11.1-RELEASE-p1 >11.1-RELEASE-p1 > >but I have seen it on 10.2 and 11.0 as well. The linux client is (/has >been) running a version of Debian. > >The export lines in /etc/exports : > >V4: / -network=3D192.168.9.0 -mask=3D255.255.255.0 > >/data/Sabnzb2015 -maproot=3Droot: -alldirs -network=3D192.168.9.0 >-mask=3D255.255.255.0 > >Uptime: > >8:27PM up 196 days, 22:10, 1 users, load averages: 0.21, 0.17, 0.17 > >Traffice since uptime (guessing NFS / non-NFS ratio of 3 to 1) > > lagg0 in 400.901 KB/s 400.901 KB/s 10.425 T= B > out 32.781 KB/s 32.781 KB/s 14.132 T= B > > >I'm wondering: can the 2015 patch provided by Rick still be "safely" >applied or has the nfs code changed too much since then? I've witnessed >this issue a couple of times now and would very much like to test the >patch provided. As above, I'd be surprised if the patch isn't already in your 11.1 kernel, but you can take a look. If it isn't, let me know because that means it slipped through the cracks and I need to get it committed, etc. rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB104253B0A8AC0693DD7D6A2FDDC00>