FreeBSD Mail Archives

Date:      Wed, 8 Jul 2015 19:27:20 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc:        Julian Elischer <julian@freebsd.org>, freebsd-fs@freebsd.org,  Xin LI <d@delphij.net>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <1274495343.6405799.1436398040440.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ@mail.gmail.com>
References:  <CANzjMX45QaC8yZx2nHPAohJRvQjmUOHuhMQWP9nX%2BsrJs707Hg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <CANzjMX6EoPOcY9V5EQeu5KO1WhwFxxo7-mYRhccVvKiaDW8nGQ@mail.gmail.com> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> <CANzjMX4MzqtBD-myifpT6i_HM97FVQ31vWjh7fiMsLJBe7Bh0w@mail.gmail.com> <CANzjMX7bvh3_%2BEBBRn6A-PeC_1tnh9FOPeOuN0x=Rr6fGCa-SA@mail.gmail.com> <CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail


Ahmed Kamal wrote:
> I have a test rhel6 box (one that can mount nfs with vers=4.1) .. However
> this is an old server with no users on it .. Can you kindly show me how to
> stress test this mount to either induce the bad sequence error, or prove
> nfs-4.1 is rock solid ?
> 
Don't ask me. You are the one that sees the problem, so all I can suggest is
get this client to do the same stuff as your other clients that exhibit the
problem.

> If upgrading all boxes to rhel-6 and nfs-4.1 is the only way to solve this
> .. then so be it .. I just want to be sure it's solid before the upgrade
> 
As I recall, you've never tried Xin Li's patch.

rick

> Thanks folks!
> 
> On Wed, Jul 8, 2015 at 4:20 PM, Ahmed Kamal <email.ahmedkamal@googlemail.com
> > wrote:
> 
> > Another note .. is that the linux boxes when they have hung processes ..
> > They have a process (rpciod) taking 10-15% CPU
> >
> > On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal <
> > email.ahmedkamal@googlemail.com> wrote:
> >
> >> Hi folks,
> >>
> >> I have tested Xin's patches .. Unfortunately the problem didn't go away
> >> :/ Many users are still reporting hung processes. If it would help, can
> >> you
> >> show me how to dump a network trace that would help you identify the issue
> >> ?
> >>
> >> Also, is it possible in any way to have my trusted nfs3, handle the case
> >> where every zfs /home folder is its own dataset ?
> >>
> >> On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >>> Ahmed Kamal wrote:
> >>> > Hi folks,
> >>> >
> >>> > Just a quick update. I did not test Xin's patches yet .. What I did so
> >>> far
> >>> > is to increase the tcp highwater tunable and increase nfsd threads to
> >>> 60.
> >>> > Today (a working day) I noticed I only got one bad sequence error
> >>> message!
> >>> > Check this:
> >>> >
> >>> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c
> >>> >       1 messages:Jul5
> >>> >      39 messages.1:Jun28
> >>> >      15 messages.1:Jun29
> >>> >       4 messages.1:Jun30
> >>> >       9 messages.1:Jul1
> >>> >      23 messages.1:Jul2
> >>> >       1 messages.1:Jul4
> >>> >       1 messages.2:Jun28
> >>> >
> >>> > So there seems to be an improvement! Not sure if the Linux nfs4 client
> >>> is
> >>> > able to somehow recover from those bad-sequence situations or not .. I
> >>> did
> >>> > get some user complaints that running "ls -l" is sometimes slow and
> >>> takes a
> >>> > couple of seconds to finish.
> >>> >
> >>> > One final question .. Do you folks think nfs4.1 is more reliable in
> >>> general
> >>> > than nfs4 .. I've always only used nfs3 (I guess it can't work here
> >>> with
> >>> > /home/* being separate zfs filesystems) .. So should I go through the
> >>> pain
> >>> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do
> >>> you
> >>> > expect the protocol to be more solid ? I know it's a fluffy question,
> >>> just
> >>> > give me your thoughts. Thanks a lot!
> >>> >
> >>> All I can say is that the "bad seqid" errors should not occur, since
> >>> NFSv4.1
> >>> doesn't use the seqid#s to order RPCs.
> >>>
> >>> Also I would say that a correctly implemented NFSv4.1 protocol should
> >>> function
> >>> "more correctly" since all RPCs and performed "exactly once". (How much
> >>> effect
> >>> this will have in practice, I can't say.)
> >>>
> >>> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over
> >>> 500pages),
> >>> so it is hard to say how mature the implementations are.
> >>> I think only testing will give you the answer.
> >>>
> >>> I would suggest that you test Xi Lin's patch that allows the "seqid + 2"
> >>> case
> >>> and see if that makes the "bad seqid" errors go away. (Even though I
> >>> think this
> >>> would indicate a client bug, adding this in way that it can be enabled
> >>> via a sysctl
> >>> seems reasonable.)
> >>>
> >>> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this,
> >>> rick
> >>>
> >>> >
> >>> >
> >>> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >>> wrote:
> >>> >
> >>> > > Ahmed Kamal wrote:
> >>> > > > PS: Today (after adjusting tcp.highwater) I didn't get any
> >>> screaming
> >>> > > > reports from users about hung vnc sessions. So maybe just maybe,
> >>> linux
> >>> > > > clients are able to somehow recover from this bad sequence
> >>> messages. I
> >>> > > > could still see the bad sequence error message in logs though
> >>> > > >
> >>> > > > Why isn't the highwater tunable set to something better by default
> >>> ? I
> >>> > > mean
> >>> > > > this server is certainly not under a high or unusual load (it's
> >>> only 40
> >>> > > PCs
> >>> > > > mounting from it)
> >>> > > >
> >>> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal <
> >>> > > email.ahmedkamal@googlemail.com
> >>> > > > > wrote:
> >>> > > >
> >>> > > > > Thanks all .. I understand now we're doing the "right thing" ..
> >>> > > Although
> >>> > > > > if mounting keeps wedging, I will have to solve it somehow!
> >>> Either
> >>> > > using
> >>> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
> >>> > > > >
> >>> > > > > Regarding Xin's patch, is it possible to build the patched nfsd
> >>> code,
> >>> > > as a
> >>> > > > > kernel module ? I'm looking to minimize my delta to upstream.
> >>> > > > >
> >>> > > Yes, you can build the nfsd as a module. If your kernel config does
> >>> not
> >>> > > include
> >>> > > "options NFSD" the module will get loaded/used. It is also possible
> >>> to
> >>> > > replace
> >>> > > the module without rebooting, but you need to kill of the nfsd
> >>> daemon then
> >>> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In
> >>> > > /boot/<kernel-name>.)
> >>> > >
> >>> > > > > Also would adopting Xin's patch and hiding it behind a
> >>> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably
> >>> not the
> >>> > > last
> >>> > > > > person on earth to hit this) ?
> >>> > > > >
> >>> > > If it fixes your problem, I think this is reasonable.
> >>> > > I'm also hoping that someone that works on the Linux client reports
> >>> > > if/when this
> >>> > > was changed.
> >>> > >
> >>> > > rick
> >>> > >
> >>> > > > > Thanks a lot for all the help!
> >>> > > > >
> >>> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
> >>> rmacklem@uoguelph.ca>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > >> Ahmed Kamal wrote:
> >>> > > > >> > Appreciating the fruitful discussion! Can someone please
> >>> explain to
> >>> > > me,
> >>> > > > >> > what would happen in the current situation (linux client
> >>> doing this
> >>> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the
> >>> effect of
> >>> > > that?
> >>> > > > >> Well, as you've seen, the Linux client doesn't function
> >>> correctly
> >>> > > against
> >>> > > > >> the FreeBSD server (and probably others that don't support this
> >>> > > > >> "skip-by-1"
> >>> > > > >> case).
> >>> > > > >>
> >>> > > > >> > What do users see? Any chances of data loss?
> >>> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what
> >>> the
> >>> > > Linux
> >>> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're
> >>> the guy
> >>> > > > >> observing
> >>> > > > >> it.
> >>> > > > >>
> >>> > > > >> >
> >>> > > > >> > Also, I find it strange that netapp have acknowledged this is
> >>> a bug
> >>> > > on
> >>> > > > >> > their side, which has been fixed since then!
> >>> > > > >> Yea, I think Netapp screwed up. For some reason their server
> >>> allowed
> >>> > > this,
> >>> > > > >> then was fixed to not allow it and then someone decided that was
> >>> > > broken
> >>> > > > >> and
> >>> > > > >> reversed it.
> >>> > > > >>
> >>> > > > >> > I also find it strange that I'm the first to hit this :) Is
> >>> no one
> >>> > > > >> running
> >>> > > > >> > nfs4 yet!
> >>> > > > >> >
> >>> > > > >> Well, it seems to be slowly catching on. I suspect that the
> >>> Linux
> >>> > > client
> >>> > > > >> mounting a Netapp is the most common use of it. Since it
> >>> appears that
> >>> > > they
> >>> > > > >> flip flopped w.r.t. who's bug this is, it has probably
> >>> persisted.
> >>> > > > >>
> >>> > > > >> It may turn out that the Linux client has been fixed or it may
> >>> turn
> >>> > > out
> >>> > > > >> that most servers allowed this "skip-by-1" even though David
> >>> Noveck
> >>> > > (one
> >>> > > > >> of the main authors of the protocol) seems to agree with me
> >>> that it
> >>> > > should
> >>> > > > >> not be allowed.
> >>> > > > >>
> >>> > > > >> It is possible that others have bumped into this, but it wasn't
> >>> > > isolated
> >>> > > > >> (I wouldn't have guessed it, so it was good you pointed to the
> >>> RedHat
> >>> > > > >> discussion)
> >>> > > > >> and they worked around it by reverting to NFSv3 or similar.
> >>> > > > >> The protocol is rather complex in this area and changed
> >>> completely for
> >>> > > > >> NFSv4.1,
> >>> > > > >> so many have also probably moved onto NFSv4.1 where this won't
> >>> be an
> >>> > > > >> issue.
> >>> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and
> >>> > > doesn't
> >>> > > > >> use
> >>> > > > >>  these seqid fields.)
> >>> > > > >>
> >>> > > > >> This is all just mho, rick
> >>> > > > >>
> >>> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
> >>> rmacklem@uoguelph.ca>
> >>> > > > >> wrote:
> >>> > > > >> >
> >>> > > > >> > > Julian Elischer wrote:
> >>> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
> >>> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they
> >>> say.
> >>> > > Please
> >>> > > > >> > > > > let me know if Xin Li's patch resolves your problem,
> >>> even
> >>> > > though I
> >>> > > > >> > > > > don't believe it is correct except for the UINT32_MAX
> >>> case.
> >>> > > Good
> >>> > > > >> > > > > luck with it, rick
> >>> > > > >> > > > and please keep us all in the loop as to what they say!
> >>> > > > >> > > >
> >>> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its
> >>> always N+1
> >>> > > in a
> >>> > > > >> > > > number field that has a
> >>> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity
> >>> in the
> >>> > > > >> > > > original spec).
> >>> > > > >> > > >
> >>> > > > >> > > Actually, since N is the lock op already done, N + 1 is the
> >>> next
> >>> > > lock
> >>> > > > >> > > operation in order. Since lock ops need to be strictly
> >>> ordered,
> >>> > > > >> allowing
> >>> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes
> >>> no
> >>> > > sense.
> >>> > > > >> > >
> >>> > > > >> > > I think the author of the RFC meant that N + 2 or greater
> >>> fails,
> >>> > > but
> >>> > > > >> it
> >>> > > > >> > > was poorly worded.
> >>> > > > >> > >
> >>> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org.
> >>> (There is
> >>> > > an
> >>> > > > >> archive
> >>> > > > >> > > of it somewhere, but I can't remember where.;-)
> >>> > > > >> > >
> >>> > > > >> > > rick
> >>> > > > >> > > _______________________________________________
> >>> > > > >> > > freebsd-fs@freebsd.org mailing list
> >>> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> > > > >> > > To unsubscribe, send any mail to "
> >>> > > freebsd-fs-unsubscribe@freebsd.org"
> >>> > > > >> > >
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1274495343.6405799.1436398040440.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation