From owner-freebsd-fs@freebsd.org Wed Jul 8 14:21:07 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8685A9964AC for ; Wed, 8 Jul 2015 14:21:07 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EF5801C69; Wed, 8 Jul 2015 14:21:06 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wifm2 with SMTP id m2so91158379wif.1; Wed, 08 Jul 2015 07:21:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=1EhHi3t63se/jWQid+AX5mG+L6QhjZbv47JRuo1Byl0=; b=a1qN6xe+EbOuhrk4j1lIbC5/D0oPIwm7T63JwgSgLSAL/yUGxU0vlxf73oqSul+gty D6hmUhBTWe0uRIpmyVz+D+vSMN5HT3PFxccCdBXb9FDID6L8+EuxgCsn/tzYHhOiMP7y ifYX3fRryBiv1wYexB/+MTHErOvrRlpEtUjxSHM4XkPVIJTFdb5KKAp2OrksmATfZmhs D5vJ/DuClSpGWgsqIRfM8kn7pkIgdukUF+N/3pfjBaZh9j73lKdIZmBPlyoxJeL1WfZV QnWqDxAZV0Q/pjun2Ywf/XqN0+uDwG79uk2grMBofXNod9wS8YLLpXXXWtlBqH2BpnV8 FbYA== X-Received: by 10.194.192.33 with SMTP id hd1mr20709204wjc.96.1436365265462; Wed, 08 Jul 2015 07:21:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Wed, 8 Jul 2015 07:20:45 -0700 (PDT) In-Reply-To: References: <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Wed, 8 Jul 2015 16:20:45 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:21:07 -0000 Another note .. is that the linux boxes when they have hung processes .. They have a process (rpciod) taking 10-15% CPU On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal wrote: > Hi folks, > > I have tested Xin's patches .. Unfortunately the problem didn't go away :/ > Many users are still reporting hung processes. If it would help, can you > show me how to dump a network trace that would help you identify the issue ? > > Also, is it possible in any way to have my trusted nfs3, handle the case > where every zfs /home folder is its own dataset ? > > On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem > wrote: > >> Ahmed Kamal wrote: >> > Hi folks, >> > >> > Just a quick update. I did not test Xin's patches yet .. What I did so >> far >> > is to increase the tcp highwater tunable and increase nfsd threads to >> 60. >> > Today (a working day) I noticed I only got one bad sequence error >> message! >> > Check this: >> > >> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c >> > 1 messages:Jul5 >> > 39 messages.1:Jun28 >> > 15 messages.1:Jun29 >> > 4 messages.1:Jun30 >> > 9 messages.1:Jul1 >> > 23 messages.1:Jul2 >> > 1 messages.1:Jul4 >> > 1 messages.2:Jun28 >> > >> > So there seems to be an improvement! Not sure if the Linux nfs4 client >> is >> > able to somehow recover from those bad-sequence situations or not .. I >> did >> > get some user complaints that running "ls -l" is sometimes slow and >> takes a >> > couple of seconds to finish. >> > >> > One final question .. Do you folks think nfs4.1 is more reliable in >> general >> > than nfs4 .. I've always only used nfs3 (I guess it can't work here with >> > /home/* being separate zfs filesystems) .. So should I go through the >> pain >> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do >> you >> > expect the protocol to be more solid ? I know it's a fluffy question, >> just >> > give me your thoughts. Thanks a lot! >> > >> All I can say is that the "bad seqid" errors should not occur, since >> NFSv4.1 >> doesn't use the seqid#s to order RPCs. >> >> Also I would say that a correctly implemented NFSv4.1 protocol should >> function >> "more correctly" since all RPCs and performed "exactly once". (How much >> effect >> this will have in practice, I can't say.) >> >> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over >> 500pages), >> so it is hard to say how mature the implementations are. >> I think only testing will give you the answer. >> >> I would suggest that you test Xi Lin's patch that allows the "seqid + 2" >> case >> and see if that makes the "bad seqid" errors go away. (Even though I >> think this >> would indicate a client bug, adding this in way that it can be enabled >> via a sysctl >> seems reasonable.) >> >> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, >> rick >> >> > >> > >> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem >> wrote: >> > >> > > Ahmed Kamal wrote: >> > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >> > > > reports from users about hung vnc sessions. So maybe just maybe, >> linux >> > > > clients are able to somehow recover from this bad sequence >> messages. I >> > > > could still see the bad sequence error message in logs though >> > > > >> > > > Why isn't the highwater tunable set to something better by default >> ? I >> > > mean >> > > > this server is certainly not under a high or unusual load (it's >> only 40 >> > > PCs >> > > > mounting from it) >> > > > >> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < >> > > email.ahmedkamal@googlemail.com >> > > > > wrote: >> > > > >> > > > > Thanks all .. I understand now we're doing the "right thing" .. >> > > Although >> > > > > if mounting keeps wedging, I will have to solve it somehow! Either >> > > using >> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >> > > > > >> > > > > Regarding Xin's patch, is it possible to build the patched nfsd >> code, >> > > as a >> > > > > kernel module ? I'm looking to minimize my delta to upstream. >> > > > > >> > > Yes, you can build the nfsd as a module. If your kernel config does >> not >> > > include >> > > "options NFSD" the module will get loaded/used. It is also possible to >> > > replace >> > > the module without rebooting, but you need to kill of the nfsd daemon >> then >> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In >> > > /boot/.) >> > > >> > > > > Also would adopting Xin's patch and hiding it behind a >> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >> the >> > > last >> > > > > person on earth to hit this) ? >> > > > > >> > > If it fixes your problem, I think this is reasonable. >> > > I'm also hoping that someone that works on the Linux client reports >> > > if/when this >> > > was changed. >> > > >> > > rick >> > > >> > > > > Thanks a lot for all the help! >> > > > > >> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > > > wrote: >> > > > > >> > > > >> Ahmed Kamal wrote: >> > > > >> > Appreciating the fruitful discussion! Can someone please >> explain to >> > > me, >> > > > >> > what would happen in the current situation (linux client doing >> this >> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the >> effect of >> > > that? >> > > > >> Well, as you've seen, the Linux client doesn't function correctly >> > > against >> > > > >> the FreeBSD server (and probably others that don't support this >> > > > >> "skip-by-1" >> > > > >> case). >> > > > >> >> > > > >> > What do users see? Any chances of data loss? >> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what >> the >> > > Linux >> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're >> the guy >> > > > >> observing >> > > > >> it. >> > > > >> >> > > > >> > >> > > > >> > Also, I find it strange that netapp have acknowledged this is >> a bug >> > > on >> > > > >> > their side, which has been fixed since then! >> > > > >> Yea, I think Netapp screwed up. For some reason their server >> allowed >> > > this, >> > > > >> then was fixed to not allow it and then someone decided that was >> > > broken >> > > > >> and >> > > > >> reversed it. >> > > > >> >> > > > >> > I also find it strange that I'm the first to hit this :) Is no >> one >> > > > >> running >> > > > >> > nfs4 yet! >> > > > >> > >> > > > >> Well, it seems to be slowly catching on. I suspect that the Linux >> > > client >> > > > >> mounting a Netapp is the most common use of it. Since it appears >> that >> > > they >> > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >> > > > >> >> > > > >> It may turn out that the Linux client has been fixed or it may >> turn >> > > out >> > > > >> that most servers allowed this "skip-by-1" even though David >> Noveck >> > > (one >> > > > >> of the main authors of the protocol) seems to agree with me that >> it >> > > should >> > > > >> not be allowed. >> > > > >> >> > > > >> It is possible that others have bumped into this, but it wasn't >> > > isolated >> > > > >> (I wouldn't have guessed it, so it was good you pointed to the >> RedHat >> > > > >> discussion) >> > > > >> and they worked around it by reverting to NFSv3 or similar. >> > > > >> The protocol is rather complex in this area and changed >> completely for >> > > > >> NFSv4.1, >> > > > >> so many have also probably moved onto NFSv4.1 where this won't >> be an >> > > > >> issue. >> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >> > > doesn't >> > > > >> use >> > > > >> these seqid fields.) >> > > > >> >> > > > >> This is all just mho, rick >> > > > >> >> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > > >> wrote: >> > > > >> > >> > > > >> > > Julian Elischer wrote: >> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they >> say. >> > > Please >> > > > >> > > > > let me know if Xin Li's patch resolves your problem, even >> > > though I >> > > > >> > > > > don't believe it is correct except for the UINT32_MAX >> case. >> > > Good >> > > > >> > > > > luck with it, rick >> > > > >> > > > and please keep us all in the loop as to what they say! >> > > > >> > > > >> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its >> always N+1 >> > > in a >> > > > >> > > > number field that has a >> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity >> in the >> > > > >> > > > original spec). >> > > > >> > > > >> > > > >> > > Actually, since N is the lock op already done, N + 1 is the >> next >> > > lock >> > > > >> > > operation in order. Since lock ops need to be strictly >> ordered, >> > > > >> allowing >> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >> > > sense. >> > > > >> > > >> > > > >> > > I think the author of the RFC meant that N + 2 or greater >> fails, >> > > but >> > > > >> it >> > > > >> > > was poorly worded. >> > > > >> > > >> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. >> (There is >> > > an >> > > > >> archive >> > > > >> > > of it somewhere, but I can't remember where.;-) >> > > > >> > > >> > > > >> > > rick >> > > > >> > > _______________________________________________ >> > > > >> > > freebsd-fs@freebsd.org mailing list >> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > > >> > > To unsubscribe, send any mail to " >> > > freebsd-fs-unsubscribe@freebsd.org" >> > > > >> > > >> > > > >> > >> > > > >> >> > > > > >> > > > > >> > > > >> > > >> > >> > >