Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Feb 2018 22:54:01 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Ruben <mail@osfux.nl>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Cc:        "rmacklem@FreeBSD.org" <rmacklem@FreeBSD.org>
Subject:   Re: Linux NFSv4 clients: bad sequence-id errors.
Message-ID:  <YQBPR0101MB104253B0A8AC0693DD7D6A2FDDC00@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <ad9a60d5-f843-d915-4e91-133705e276e3@osfux.nl>
References:  <ad9a60d5-f843-d915-4e91-133705e276e3@osfux.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
Ruben wrote:
>I'm experiencing a strange issue on a machine providing a couple of
>nfsv4 exports. A Linux client that generates a lot of traffic to and
>from the nfs server sometimes starts throwing "bad sequence-id errors":
>
>Feb 27 10:39:42 localhost kernel: [12481477.608103] NFS: v4 server
>returned a bad sequence-id error on an unconfirmed sequence 80f7d0d0!
The handling of sequence-id in NFSv4.0 is complex and I won't even try
to guess why this is happening.
I am surprised that your Linux mounts are using NFSv4.0 and not NFSv4.1?
(Usually Linux uses the most recent version supported by the server.)
I mention this since "sessions" replaced the sequence-id stuff in NFSv4.1
and, as such, shouldn't have such an issue.

>They typically occur after a couple of months of uptime on the nfsd
>machine. Every couple of seconds they are thrown by the client. The
>situation is "remedied" by restarting the nfsd on the server. Although
>functionality on the specific client does not appear to be affected
>(much?), its a bit disturbing. I've done some digging and found :
The fact that this is fixed by restarting the nfsd suggests a client side
problem.
Why?
Because restarting the nfsd does not reset any server state, so the sequenc=
e-id
situation would not be affected by doing this. (To get rid of server side s=
tate,
you must unload the nfsd.ko after killing off the nfsd daemon.)

All restarting the nfsd daemon will do is force the client to establish a n=
ew
TCP connection. That is at a layer below the NFS state.

>https://lists.freebsd.org/pipermail/freebsd-fs/2015-July/021707.html
>
>and the patch attached by Rick (  nfsv41exch.patch :
>http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20150729/586f776=
a/attachment.bin
>) .
>
>Since the issue started manifesting itself I have restarted the nfs
>daemon (grabbed a pcap and the corresponding error lines mentioning the
>sequences prior to doing that in case anyone is interested).
If you email me the pcap as an attachment, I can take a look at it in wires=
hark.

>The nfs server runs FreeBSD 11.1 :
I'm being lazy and not looking, but I am almost sure a 2015 patch will be i=
n 11.1
and probably also in 10.2 and 11.0.

>freebsd-version -uk
>11.1-RELEASE-p1
>11.1-RELEASE-p1
>
>but I have seen it on 10.2 and 11.0 as well. The linux client is (/has
>been) running a version of Debian.
>
>The export lines in /etc/exports :
>
>V4: / -network=3D192.168.9.0 -mask=3D255.255.255.0
>
>/data/Sabnzb2015 -maproot=3Droot: -alldirs -network=3D192.168.9.0
>-mask=3D255.255.255.0
>
>Uptime:
>
>8:27PM  up 196 days, 22:10, 1 users, load averages: 0.21, 0.17, 0.17
>
>Traffice since uptime (guessing NFS / non-NFS ratio of 3 to 1)
>
>          lagg0  in    400.901 KB/s        400.901 KB/s           10.425 T=
B
>                 out    32.781 KB/s         32.781 KB/s           14.132 T=
B
>
>
>I'm wondering: can the 2015 patch provided by Rick still be "safely"
>applied or has the nfs code changed too much since then? I've witnessed
>this issue a couple of times now and would very much like to test the
>patch provided.
As above, I'd be surprised if the patch isn't already in your 11.1 kernel,
but you can take a look.
If it isn't, let me know because that means it slipped through the cracks
and I need to get it committed, etc.

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB104253B0A8AC0693DD7D6A2FDDC00>