FreeBSD Mail Archives

Date:      Wed, 8 Jul 2015 16:57:18 +0200
From:      Ahmed Kamal <email.ahmedkamal@googlemail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Julian Elischer <julian@freebsd.org>, freebsd-fs@freebsd.org, Xin LI <d@delphij.net>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ@mail.gmail.com>
In-Reply-To: <CANzjMX7bvh3_%2BEBBRn6A-PeC_1tnh9FOPeOuN0x=Rr6fGCa-SA@mail.gmail.com>
References:  <CANzjMX45QaC8yZx2nHPAohJRvQjmUOHuhMQWP9nX%2BsrJs707Hg@mail.gmail.com> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <CANzjMX6EoPOcY9V5EQeu5KO1WhwFxxo7-mYRhccVvKiaDW8nGQ@mail.gmail.com> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> <CANzjMX4MzqtBD-myifpT6i_HM97FVQ31vWjh7fiMsLJBe7Bh0w@mail.gmail.com> <CANzjMX7bvh3_%2BEBBRn6A-PeC_1tnh9FOPeOuN0x=Rr6fGCa-SA@mail.gmail.com>

I have a test rhel6 box (one that can mount nfs with vers=4.1) .. However
this is an old server with no users on it .. Can you kindly show me how to
stress test this mount to either induce the bad sequence error, or prove
nfs-4.1 is rock solid ?

If upgrading all boxes to rhel-6 and nfs-4.1 is the only way to solve this
.. then so be it .. I just want to be sure it's solid before the upgrade

Thanks folks!

On Wed, Jul 8, 2015 at 4:20 PM, Ahmed Kamal <email.ahmedkamal@googlemail.com
> wrote:

> Another note .. is that the linux boxes when they have hung processes ..
> They have a process (rpciod) taking 10-15% CPU
>
> On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal <
> email.ahmedkamal@googlemail.com> wrote:
>
>> Hi folks,
>>
>> I have tested Xin's patches .. Unfortunately the problem didn't go away
>> :/ Many users are still reporting hung processes. If it would help, can you
>> show me how to dump a network trace that would help you identify the issue ?
>>
>> Also, is it possible in any way to have my trusted nfs3, handle the case
>> where every zfs /home folder is its own dataset ?
>>
>> On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem <rmacklem@uoguelph.ca>
>> wrote:
>>
>>> Ahmed Kamal wrote:
>>> > Hi folks,
>>> >
>>> > Just a quick update. I did not test Xin's patches yet .. What I did so
>>> far
>>> > is to increase the tcp highwater tunable and increase nfsd threads to
>>> 60.
>>> > Today (a working day) I noticed I only got one bad sequence error
>>> message!
>>> > Check this:
>>> >
>>> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c
>>> >       1 messages:Jul5
>>> >      39 messages.1:Jun28
>>> >      15 messages.1:Jun29
>>> >       4 messages.1:Jun30
>>> >       9 messages.1:Jul1
>>> >      23 messages.1:Jul2
>>> >       1 messages.1:Jul4
>>> >       1 messages.2:Jun28
>>> >
>>> > So there seems to be an improvement! Not sure if the Linux nfs4 client
>>> is
>>> > able to somehow recover from those bad-sequence situations or not .. I
>>> did
>>> > get some user complaints that running "ls -l" is sometimes slow and
>>> takes a
>>> > couple of seconds to finish.
>>> >
>>> > One final question .. Do you folks think nfs4.1 is more reliable in
>>> general
>>> > than nfs4 .. I've always only used nfs3 (I guess it can't work here
>>> with
>>> > /home/* being separate zfs filesystems) .. So should I go through the
>>> pain
>>> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do
>>> you
>>> > expect the protocol to be more solid ? I know it's a fluffy question,
>>> just
>>> > give me your thoughts. Thanks a lot!
>>> >
>>> All I can say is that the "bad seqid" errors should not occur, since
>>> NFSv4.1
>>> doesn't use the seqid#s to order RPCs.
>>>
>>> Also I would say that a correctly implemented NFSv4.1 protocol should
>>> function
>>> "more correctly" since all RPCs and performed "exactly once". (How much
>>> effect
>>> this will have in practice, I can't say.)
>>>
>>> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over
>>> 500pages),
>>> so it is hard to say how mature the implementations are.
>>> I think only testing will give you the answer.
>>>
>>> I would suggest that you test Xi Lin's patch that allows the "seqid + 2"
>>> case
>>> and see if that makes the "bad seqid" errors go away. (Even though I
>>> think this
>>> would indicate a client bug, adding this in way that it can be enabled
>>> via a sysctl
>>> seems reasonable.)
>>>
>>> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this,
>>> rick
>>>
>>> >
>>> >
>>> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>> wrote:
>>> >
>>> > > Ahmed Kamal wrote:
>>> > > > PS: Today (after adjusting tcp.highwater) I didn't get any
>>> screaming
>>> > > > reports from users about hung vnc sessions. So maybe just maybe,
>>> linux
>>> > > > clients are able to somehow recover from this bad sequence
>>> messages. I
>>> > > > could still see the bad sequence error message in logs though
>>> > > >
>>> > > > Why isn't the highwater tunable set to something better by default
>>> ? I
>>> > > mean
>>> > > > this server is certainly not under a high or unusual load (it's
>>> only 40
>>> > > PCs
>>> > > > mounting from it)
>>> > > >
>>> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal <
>>> > > email.ahmedkamal@googlemail.com
>>> > > > > wrote:
>>> > > >
>>> > > > > Thanks all .. I understand now we're doing the "right thing" ..
>>> > > Although
>>> > > > > if mounting keeps wedging, I will have to solve it somehow!
>>> Either
>>> > > using
>>> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
>>> > > > >
>>> > > > > Regarding Xin's patch, is it possible to build the patched nfsd
>>> code,
>>> > > as a
>>> > > > > kernel module ? I'm looking to minimize my delta to upstream.
>>> > > > >
>>> > > Yes, you can build the nfsd as a module. If your kernel config does
>>> not
>>> > > include
>>> > > "options NFSD" the module will get loaded/used. It is also possible
>>> to
>>> > > replace
>>> > > the module without rebooting, but you need to kill of the nfsd
>>> daemon then
>>> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In
>>> > > /boot/<kernel-name>.)
>>> > >
>>> > > > > Also would adopting Xin's patch and hiding it behind a
>>> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably
>>> not the
>>> > > last
>>> > > > > person on earth to hit this) ?
>>> > > > >
>>> > > If it fixes your problem, I think this is reasonable.
>>> > > I'm also hoping that someone that works on the Linux client reports
>>> > > if/when this
>>> > > was changed.
>>> > >
>>> > > rick
>>> > >
>>> > > > > Thanks a lot for all the help!
>>> > > > >
>>> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
>>> rmacklem@uoguelph.ca>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Ahmed Kamal wrote:
>>> > > > >> > Appreciating the fruitful discussion! Can someone please
>>> explain to
>>> > > me,
>>> > > > >> > what would happen in the current situation (linux client
>>> doing this
>>> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the
>>> effect of
>>> > > that?
>>> > > > >> Well, as you've seen, the Linux client doesn't function
>>> correctly
>>> > > against
>>> > > > >> the FreeBSD server (and probably others that don't support this
>>> > > > >> "skip-by-1"
>>> > > > >> case).
>>> > > > >>
>>> > > > >> > What do users see? Any chances of data loss?
>>> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what
>>> the
>>> > > Linux
>>> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're
>>> the guy
>>> > > > >> observing
>>> > > > >> it.
>>> > > > >>
>>> > > > >> >
>>> > > > >> > Also, I find it strange that netapp have acknowledged this is
>>> a bug
>>> > > on
>>> > > > >> > their side, which has been fixed since then!
>>> > > > >> Yea, I think Netapp screwed up. For some reason their server
>>> allowed
>>> > > this,
>>> > > > >> then was fixed to not allow it and then someone decided that was
>>> > > broken
>>> > > > >> and
>>> > > > >> reversed it.
>>> > > > >>
>>> > > > >> > I also find it strange that I'm the first to hit this :) Is
>>> no one
>>> > > > >> running
>>> > > > >> > nfs4 yet!
>>> > > > >> >
>>> > > > >> Well, it seems to be slowly catching on. I suspect that the
>>> Linux
>>> > > client
>>> > > > >> mounting a Netapp is the most common use of it. Since it
>>> appears that
>>> > > they
>>> > > > >> flip flopped w.r.t. who's bug this is, it has probably
>>> persisted.
>>> > > > >>
>>> > > > >> It may turn out that the Linux client has been fixed or it may
>>> turn
>>> > > out
>>> > > > >> that most servers allowed this "skip-by-1" even though David
>>> Noveck
>>> > > (one
>>> > > > >> of the main authors of the protocol) seems to agree with me
>>> that it
>>> > > should
>>> > > > >> not be allowed.
>>> > > > >>
>>> > > > >> It is possible that others have bumped into this, but it wasn't
>>> > > isolated
>>> > > > >> (I wouldn't have guessed it, so it was good you pointed to the
>>> RedHat
>>> > > > >> discussion)
>>> > > > >> and they worked around it by reverting to NFSv3 or similar.
>>> > > > >> The protocol is rather complex in this area and changed
>>> completely for
>>> > > > >> NFSv4.1,
>>> > > > >> so many have also probably moved onto NFSv4.1 where this won't
>>> be an
>>> > > > >> issue.
>>> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and
>>> > > doesn't
>>> > > > >> use
>>> > > > >>  these seqid fields.)
>>> > > > >>
>>> > > > >> This is all just mho, rick
>>> > > > >>
>>> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
>>> rmacklem@uoguelph.ca>
>>> > > > >> wrote:
>>> > > > >> >
>>> > > > >> > > Julian Elischer wrote:
>>> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
>>> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they
>>> say.
>>> > > Please
>>> > > > >> > > > > let me know if Xin Li's patch resolves your problem,
>>> even
>>> > > though I
>>> > > > >> > > > > don't believe it is correct except for the UINT32_MAX
>>> case.
>>> > > Good
>>> > > > >> > > > > luck with it, rick
>>> > > > >> > > > and please keep us all in the loop as to what they say!
>>> > > > >> > > >
>>> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its
>>> always N+1
>>> > > in a
>>> > > > >> > > > number field that has a
>>> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity
>>> in the
>>> > > > >> > > > original spec).
>>> > > > >> > > >
>>> > > > >> > > Actually, since N is the lock op already done, N + 1 is the
>>> next
>>> > > lock
>>> > > > >> > > operation in order. Since lock ops need to be strictly
>>> ordered,
>>> > > > >> allowing
>>> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes
>>> no
>>> > > sense.
>>> > > > >> > >
>>> > > > >> > > I think the author of the RFC meant that N + 2 or greater
>>> fails,
>>> > > but
>>> > > > >> it
>>> > > > >> > > was poorly worded.
>>> > > > >> > >
>>> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org.
>>> (There is
>>> > > an
>>> > > > >> archive
>>> > > > >> > > of it somewhere, but I can't remember where.;-)
>>> > > > >> > >
>>> > > > >> > > rick
>>> > > > >> > > _______________________________________________
>>> > > > >> > > freebsd-fs@freebsd.org mailing list
>>> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> > > > >> > > To unsubscribe, send any mail to "
>>> > > freebsd-fs-unsubscribe@freebsd.org"
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation