Date: Wed, 8 Jul 2015 19:27:20 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Ahmed Kamal <email.ahmedkamal@googlemail.com> Cc: Julian Elischer <julian@freebsd.org>, freebsd-fs@freebsd.org, Xin LI <d@delphij.net> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <1274495343.6405799.1436398040440.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ@mail.gmail.com> References: <CANzjMX45QaC8yZx2nHPAohJRvQjmUOHuhMQWP9nX%2BsrJs707Hg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <CANzjMX6EoPOcY9V5EQeu5KO1WhwFxxo7-mYRhccVvKiaDW8nGQ@mail.gmail.com> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> <CANzjMX4MzqtBD-myifpT6i_HM97FVQ31vWjh7fiMsLJBe7Bh0w@mail.gmail.com> <CANzjMX7bvh3_%2BEBBRn6A-PeC_1tnh9FOPeOuN0x=Rr6fGCa-SA@mail.gmail.com> <CANzjMX6jbO8PrJD1WKVnWL12UqxDZh4jrMEJ0HxbVzDG448QFQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Ahmed Kamal wrote: > I have a test rhel6 box (one that can mount nfs with vers=4.1) .. However > this is an old server with no users on it .. Can you kindly show me how to > stress test this mount to either induce the bad sequence error, or prove > nfs-4.1 is rock solid ? > Don't ask me. You are the one that sees the problem, so all I can suggest is get this client to do the same stuff as your other clients that exhibit the problem. > If upgrading all boxes to rhel-6 and nfs-4.1 is the only way to solve this > .. then so be it .. I just want to be sure it's solid before the upgrade > As I recall, you've never tried Xin Li's patch. rick > Thanks folks! > > On Wed, Jul 8, 2015 at 4:20 PM, Ahmed Kamal <email.ahmedkamal@googlemail.com > > wrote: > > > Another note .. is that the linux boxes when they have hung processes .. > > They have a process (rpciod) taking 10-15% CPU > > > > On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > >> Hi folks, > >> > >> I have tested Xin's patches .. Unfortunately the problem didn't go away > >> :/ Many users are still reporting hung processes. If it would help, can > >> you > >> show me how to dump a network trace that would help you identify the issue > >> ? > >> > >> Also, is it possible in any way to have my trusted nfs3, handle the case > >> where every zfs /home folder is its own dataset ? > >> > >> On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem <rmacklem@uoguelph.ca> > >> wrote: > >> > >>> Ahmed Kamal wrote: > >>> > Hi folks, > >>> > > >>> > Just a quick update. I did not test Xin's patches yet .. What I did so > >>> far > >>> > is to increase the tcp highwater tunable and increase nfsd threads to > >>> 60. > >>> > Today (a working day) I noticed I only got one bad sequence error > >>> message! > >>> > Check this: > >>> > > >>> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > >>> > 1 messages:Jul5 > >>> > 39 messages.1:Jun28 > >>> > 15 messages.1:Jun29 > >>> > 4 messages.1:Jun30 > >>> > 9 messages.1:Jul1 > >>> > 23 messages.1:Jul2 > >>> > 1 messages.1:Jul4 > >>> > 1 messages.2:Jun28 > >>> > > >>> > So there seems to be an improvement! Not sure if the Linux nfs4 client > >>> is > >>> > able to somehow recover from those bad-sequence situations or not .. I > >>> did > >>> > get some user complaints that running "ls -l" is sometimes slow and > >>> takes a > >>> > couple of seconds to finish. > >>> > > >>> > One final question .. Do you folks think nfs4.1 is more reliable in > >>> general > >>> > than nfs4 .. I've always only used nfs3 (I guess it can't work here > >>> with > >>> > /home/* being separate zfs filesystems) .. So should I go through the > >>> pain > >>> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do > >>> you > >>> > expect the protocol to be more solid ? I know it's a fluffy question, > >>> just > >>> > give me your thoughts. Thanks a lot! > >>> > > >>> All I can say is that the "bad seqid" errors should not occur, since > >>> NFSv4.1 > >>> doesn't use the seqid#s to order RPCs. > >>> > >>> Also I would say that a correctly implemented NFSv4.1 protocol should > >>> function > >>> "more correctly" since all RPCs and performed "exactly once". (How much > >>> effect > >>> this will have in practice, I can't say.) > >>> > >>> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over > >>> 500pages), > >>> so it is hard to say how mature the implementations are. > >>> I think only testing will give you the answer. > >>> > >>> I would suggest that you test Xi Lin's patch that allows the "seqid + 2" > >>> case > >>> and see if that makes the "bad seqid" errors go away. (Even though I > >>> think this > >>> would indicate a client bug, adding this in way that it can be enabled > >>> via a sysctl > >>> seems reasonable.) > >>> > >>> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, > >>> rick > >>> > >>> > > >>> > > >>> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem <rmacklem@uoguelph.ca> > >>> wrote: > >>> > > >>> > > Ahmed Kamal wrote: > >>> > > > PS: Today (after adjusting tcp.highwater) I didn't get any > >>> screaming > >>> > > > reports from users about hung vnc sessions. So maybe just maybe, > >>> linux > >>> > > > clients are able to somehow recover from this bad sequence > >>> messages. I > >>> > > > could still see the bad sequence error message in logs though > >>> > > > > >>> > > > Why isn't the highwater tunable set to something better by default > >>> ? I > >>> > > mean > >>> > > > this server is certainly not under a high or unusual load (it's > >>> only 40 > >>> > > PCs > >>> > > > mounting from it) > >>> > > > > >>> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > >>> > > email.ahmedkamal@googlemail.com > >>> > > > > wrote: > >>> > > > > >>> > > > > Thanks all .. I understand now we're doing the "right thing" .. > >>> > > Although > >>> > > > > if mounting keeps wedging, I will have to solve it somehow! > >>> Either > >>> > > using > >>> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > >>> > > > > > >>> > > > > Regarding Xin's patch, is it possible to build the patched nfsd > >>> code, > >>> > > as a > >>> > > > > kernel module ? I'm looking to minimize my delta to upstream. > >>> > > > > > >>> > > Yes, you can build the nfsd as a module. If your kernel config does > >>> not > >>> > > include > >>> > > "options NFSD" the module will get loaded/used. It is also possible > >>> to > >>> > > replace > >>> > > the module without rebooting, but you need to kill of the nfsd > >>> daemon then > >>> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > >>> > > /boot/<kernel-name>.) > >>> > > > >>> > > > > Also would adopting Xin's patch and hiding it behind a > >>> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > >>> not the > >>> > > last > >>> > > > > person on earth to hit this) ? > >>> > > > > > >>> > > If it fixes your problem, I think this is reasonable. > >>> > > I'm also hoping that someone that works on the Linux client reports > >>> > > if/when this > >>> > > was changed. > >>> > > > >>> > > rick > >>> > > > >>> > > > > Thanks a lot for all the help! > >>> > > > > > >>> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > > > wrote: > >>> > > > > > >>> > > > >> Ahmed Kamal wrote: > >>> > > > >> > Appreciating the fruitful discussion! Can someone please > >>> explain to > >>> > > me, > >>> > > > >> > what would happen in the current situation (linux client > >>> doing this > >>> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > >>> effect of > >>> > > that? > >>> > > > >> Well, as you've seen, the Linux client doesn't function > >>> correctly > >>> > > against > >>> > > > >> the FreeBSD server (and probably others that don't support this > >>> > > > >> "skip-by-1" > >>> > > > >> case). > >>> > > > >> > >>> > > > >> > What do users see? Any chances of data loss? > >>> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > >>> the > >>> > > Linux > >>> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're > >>> the guy > >>> > > > >> observing > >>> > > > >> it. > >>> > > > >> > >>> > > > >> > > >>> > > > >> > Also, I find it strange that netapp have acknowledged this is > >>> a bug > >>> > > on > >>> > > > >> > their side, which has been fixed since then! > >>> > > > >> Yea, I think Netapp screwed up. For some reason their server > >>> allowed > >>> > > this, > >>> > > > >> then was fixed to not allow it and then someone decided that was > >>> > > broken > >>> > > > >> and > >>> > > > >> reversed it. > >>> > > > >> > >>> > > > >> > I also find it strange that I'm the first to hit this :) Is > >>> no one > >>> > > > >> running > >>> > > > >> > nfs4 yet! > >>> > > > >> > > >>> > > > >> Well, it seems to be slowly catching on. I suspect that the > >>> Linux > >>> > > client > >>> > > > >> mounting a Netapp is the most common use of it. Since it > >>> appears that > >>> > > they > >>> > > > >> flip flopped w.r.t. who's bug this is, it has probably > >>> persisted. > >>> > > > >> > >>> > > > >> It may turn out that the Linux client has been fixed or it may > >>> turn > >>> > > out > >>> > > > >> that most servers allowed this "skip-by-1" even though David > >>> Noveck > >>> > > (one > >>> > > > >> of the main authors of the protocol) seems to agree with me > >>> that it > >>> > > should > >>> > > > >> not be allowed. > >>> > > > >> > >>> > > > >> It is possible that others have bumped into this, but it wasn't > >>> > > isolated > >>> > > > >> (I wouldn't have guessed it, so it was good you pointed to the > >>> RedHat > >>> > > > >> discussion) > >>> > > > >> and they worked around it by reverting to NFSv3 or similar. > >>> > > > >> The protocol is rather complex in this area and changed > >>> completely for > >>> > > > >> NFSv4.1, > >>> > > > >> so many have also probably moved onto NFSv4.1 where this won't > >>> be an > >>> > > > >> issue. > >>> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > >>> > > doesn't > >>> > > > >> use > >>> > > > >> these seqid fields.) > >>> > > > >> > >>> > > > >> This is all just mho, rick > >>> > > > >> > >>> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > > >> wrote: > >>> > > > >> > > >>> > > > >> > > Julian Elischer wrote: > >>> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > >>> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > >>> say. > >>> > > Please > >>> > > > >> > > > > let me know if Xin Li's patch resolves your problem, > >>> even > >>> > > though I > >>> > > > >> > > > > don't believe it is correct except for the UINT32_MAX > >>> case. > >>> > > Good > >>> > > > >> > > > > luck with it, rick > >>> > > > >> > > > and please keep us all in the loop as to what they say! > >>> > > > >> > > > > >>> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its > >>> always N+1 > >>> > > in a > >>> > > > >> > > > number field that has a > >>> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity > >>> in the > >>> > > > >> > > > original spec). > >>> > > > >> > > > > >>> > > > >> > > Actually, since N is the lock op already done, N + 1 is the > >>> next > >>> > > lock > >>> > > > >> > > operation in order. Since lock ops need to be strictly > >>> ordered, > >>> > > > >> allowing > >>> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes > >>> no > >>> > > sense. > >>> > > > >> > > > >>> > > > >> > > I think the author of the RFC meant that N + 2 or greater > >>> fails, > >>> > > but > >>> > > > >> it > >>> > > > >> > > was poorly worded. > >>> > > > >> > > > >>> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > >>> (There is > >>> > > an > >>> > > > >> archive > >>> > > > >> > > of it somewhere, but I can't remember where.;-) > >>> > > > >> > > > >>> > > > >> > > rick > >>> > > > >> > > _______________________________________________ > >>> > > > >> > > freebsd-fs@freebsd.org mailing list > >>> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > > >> > > To unsubscribe, send any mail to " > >>> > > freebsd-fs-unsubscribe@freebsd.org" > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> > >> > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1274495343.6405799.1436398040440.JavaMail.zimbra>