Date: Thu, 23 Jul 2015 09:37:19 -0700 From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> To: Ahmed Kamal <email.ahmedkamal@googlemail.com> Cc: Rick Macklem <rmacklem@uoguelph.ca>, Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <CAOgwaMsF%2BO%2B0ObAR6L52kjB6vucaAM9aaHHmjUazQ%2B74XKES9w@mail.gmail.com> In-Reply-To: <CANzjMX6z39CSZgo7ag%2BMhWqF-m2=nC85XqwunBXMYDRYG92qPw@mail.gmail.com> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com> <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com> <CANzjMX5RA4eSQ8sk1n5hG0AaeThDJqw4x7iJu6kQEV_3%2BQAXpQ@mail.gmail.com> <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> <CANzjMX6z39CSZgo7ag%2BMhWqF-m2=nC85XqwunBXMYDRYG92qPw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 23, 2015 at 8:26 AM, Ahmed Kamal via freebsd-fs < freebsd-fs@freebsd.org> wrote: > Well .. The problem is now gone, so I guess I can't collect more data till > it happens (or hopefully doesn't :) happen again .. So as I described, I > had to restart the FreeBSD NFS server box first .. maybe this caused linux > clients to give up after 5 mins, and attempt to destroy the session ? When > the NFS server was back up .. It was being bombarded (50Mbps traffic) with > rpc traffic, probably saying this "destroy session" message. > > What I don't understand however is, why doesn't this end. What does FreeBSD > reply with? Shouldn't it say, Okay, I don't know anything about this > session, so consider it destroyed .. suit yourself linux .. or does it > refuse to destroy, causing Linux to keep on retrying like crazy ? > > My opinion is that in the latest Linux NFS client there is a problem : It is consuming too much time to communicate with the Linux server . For that reason , I have switched backed to Fedora 18 as a client because of this "fighting" with the server visible from the switch lights and getting response after a long activity which is meaningless to make so much activity get a response . Server is Fedora 19 . Mehmet Erol Sanliturk > On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem <rmacklem@uoguelph.ca> > wrote: > > > Ahmed Kamal wrote: > > > rhel6 servers logs were flooded with errors like: > > http://paste2.org/EwLGcGF6 > > > The Freebsd box was being pounded with 40Mbps of nfs traffic .. > probably > > > Linux was retrying too hard ?! I had to reboot all PCs and after the > last > > > one, nfsd CPU usage dropped immediately to zero > > > > > Btw, it would be interesting to know what triggers these things (overload > > of > > the nfs server resulting in very slow response or ???). Basically > > Destroy_session > > isn't an operation that a client would normally do. I have no idea why > the > > Linux > > client would do it. (A session is what achieves the "exactly once" > > semantics for > > the RPCs. It should really be in the RPC layer, but the NFSv4 working > > group put > > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think > > of a reason > > to destroy a session except on dismount. Maybe if the client thinks the > > session is > > broken for some reason??) > > > > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running > > repeatedly > > (once/sec with timestamps via "date" or similar) so that you can see what > > was happening just > > before the meltdowns. > > > > A raw packet trace of just when the meltdown starts would be useful, but > I > > can't think > > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w > > <file>.pcap <client-host>" > > run for 1sec and then kill/restart it repeatedly with different file > > names, so you might get > > a useful 1sec capture at the critical time? > > > > Anyhow, good luck with it, rick > > > > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com> wrote: > > > > > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just > > did > > > > the dtrace with: > > > > > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > > > reboot all NFS clients .. This would be really sad .. Any advice is > > most > > > > appreciated .. Thanks > > > > > > > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > > > email.ahmedkamal@googlemail.com> wrote: > > > > > > > >> Hi folks, > > > >> > > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on > > it to > > > >> see what happens. > > > >> > > > >> During the process, I made the (I guess mistake) of zfs send | recv > > to a > > > >> locally attached usb disk for backup purposes .. long story short, > > > >> sharenfs > > > >> property on the received filesystem was causing some nfs/mountd > > errors in > > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > > > >> datasets and the whole pool eventually .. and then rebooted the > whole > > nas > > > >> box .. After reboot my logs are still flooded with > > > >> > > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > >> Jul 21 05:13:07 nas last message repeated 7536 times > > > >> Jul 21 05:15:08 nas last message repeated 29664 times > > > >> > > > >> Not sure what that means .. or how it can be stopped .. Anyway, will > > keep > > > >> you posted on progress. > > > >> > > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem@uoguelph.ca > > > > > >> wrote: > > > >> > > > >>> Graham Allan wrote: > > > >>> > I'm curious how things are going for you with this? > > > >>> > > > > >>> > Reading your thread did pique my interest since we have a lot of > > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I > > meant > > > >>> > to glance through our logs for signs of the same issue, but > today I > > > >>> > started investigating a machine which appeared to have hung > > processes, > > > >>> > high rpciod load, and high traffic to the NFS server. Of course > it > > is > > > >>> > exactly this issue. > > > >>> > > > > >>> > The affected machine is running SL5 though most of our server > > nodes are > > > >>> > now SL6. I can see errors from most of them but the SL6 systems > > appear > > > >>> > less affected - I see a stream of the sequence-id errors in their > > logs > > > >>> but > > > >>> > things in general keep working. The one SL5 machine I'm looking > at > > > >>> > has a single sequence-id error in today's logs, but then goes > into > > a > > > >>> > stream of "state recovery failed" then "Lock reclaim failed". > It's > > > >>> > probably partly related to the particular workload on this > machine. > > > >>> > > > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is > > it in > > > >>> > 10.1?). > > > >>> > > > > >>> Btw, I've done some testing against a fairly recent Fedora and > > haven't > > > >>> seen > > > >>> the problem. If either of you guys could load a recent Fedora on a > > test > > > >>> client > > > >>> box, it would be interesting to see if it suffers from this. (My > > > >>> experience is > > > >>> that the Fedora distros have more up to date Linux NFS clients.) > > > >>> > > > >>> rick > > > >>> > > > >>> > At the NFS servers, most of the sysctl settings are already tuned > > > >>> > from defaults. eg tcp.highwater=100000, > vfs.nfsd.tcpcachetimeo=300, > > > >>> > 128-256 nfs kernel threads. > > > >>> > > > > >>> > Graham > > > >>> > > > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via > > freebsd-fs > > > >>> wrote: > > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any > > screaming > > > >>> > > reports from users about hung vnc sessions. So maybe just > maybe, > > > >>> linux > > > >>> > > clients are able to somehow recover from this bad sequence > > messages. > > > >>> I > > > >>> > > could still see the bad sequence error message in logs though > > > >>> > > > > > >>> > > Why isn't the highwater tunable set to something better by > > default ? > > > >>> I mean > > > >>> > > this server is certainly not under a high or unusual load (it's > > only > > > >>> 40 PCs > > > >>> > > mounting from it) > > > >>> > > > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > > >>> > > <email.ahmedkamal@googlemail.com > > > >>> > > > wrote: > > > >>> > > > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" > .. > > > >>> Although > > > >>> > > > if mounting keeps wedging, I will have to solve it somehow! > > Either > > > >>> using > > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > >>> > > > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched > nfsd > > > >>> code, as > > > >>> > > > a > > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > > > >>> > > > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > > not > > > >>> the > > > >>> > > > last > > > >>> > > > person on earth to hit this) ? > > > >>> > > > > > > >>> > > > Thanks a lot for all the help! > > > >>> > > > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > > >>> rmacklem@uoguelph.ca> > > > >>> > > > >>> > > > wrote: > > > >>> > > > > > > >>> > > >> Ahmed Kamal wrote: > > > >>> > > >> > Appreciating the fruitful discussion! Can someone please > > > >>> explain to > > > >>> > > >> > me, > > > >>> > > >> > what would happen in the current situation (linux client > > doing > > > >>> this > > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > > effect > > > >>> of > > > >>> > > >> > that? > > > >>> > > >> Well, as you've seen, the Linux client doesn't function > > correctly > > > >>> > > >> against > > > >>> > > >> the FreeBSD server (and probably others that don't support > > this > > > >>> > > >> "skip-by-1" > > > >>> > > >> case). > > > >>> > > >> > > > >>> > > >> > What do users see? Any chances of data loss? > > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess > > what > > > >>> the > > > >>> > > >> Linux > > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. > You're > > the > > > >>> guy > > > >>> > > >> observing > > > >>> > > >> it. > > > >>> > > >> > > > >>> > > >> > > > > >>> > > >> > Also, I find it strange that netapp have acknowledged this > > is a > > > >>> bug on > > > >>> > > >> > their side, which has been fixed since then! > > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > > > >>> allowed > > > >>> > > >> this, > > > >>> > > >> then was fixed to not allow it and then someone decided that > > was > > > >>> broken > > > >>> > > >> and > > > >>> > > >> reversed it. > > > >>> > > >> > > > >>> > > >> > I also find it strange that I'm the first to hit this :) > Is > > no > > > >>> one > > > >>> > > >> running > > > >>> > > >> > nfs4 yet! > > > >>> > > >> > > > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the > > Linux > > > >>> client > > > >>> > > >> mounting a Netapp is the most common use of it. Since it > > appears > > > >>> that > > > >>> > > >> they > > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably > > persisted. > > > >>> > > >> > > > >>> > > >> It may turn out that the Linux client has been fixed or it > may > > > >>> turn out > > > >>> > > >> that most servers allowed this "skip-by-1" even though David > > > >>> Noveck (one > > > >>> > > >> of the main authors of the protocol) seems to agree with me > > that > > > >>> it > > > >>> > > >> should > > > >>> > > >> not be allowed. > > > >>> > > >> > > > >>> > > >> It is possible that others have bumped into this, but it > > wasn't > > > >>> isolated > > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to > the > > > >>> RedHat > > > >>> > > >> discussion) > > > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > > > >>> > > >> The protocol is rather complex in this area and changed > > > >>> completely for > > > >>> > > >> NFSv4.1, > > > >>> > > >> so many have also probably moved onto NFSv4.1 where this > > won't be > > > >>> an > > > >>> > > >> issue. > > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics > > and > > > >>> doesn't > > > >>> > > >> use > > > >>> > > >> these seqid fields.) > > > >>> > > >> > > > >>> > > >> This is all just mho, rick > > > >>> > > >> > > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > > >>> rmacklem@uoguelph.ca> > > > >>> > > >> wrote: > > > >>> > > >> > > > > >>> > > >> > > Julian Elischer wrote: > > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what > they > > > >>> say. > > > >>> > > >> > > > > Please > > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, > > even > > > >>> though > > > >>> > > >> > > > > I > > > >>> > > >> > > > > don't believe it is correct except for the > UINT32_MAX > > > >>> case. Good > > > >>> > > >> > > > > luck with it, rick > > > >>> > > >> > > > and please keep us all in the loop as to what they > say! > > > >>> > > >> > > > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its > > always > > > >>> N+1 in > > > >>> > > >> > > > a > > > >>> > > >> > > > number field that has a > > > >>> > > >> > > > bit of slack at wrap time (probably due to some > > ambiguity > > > >>> in the > > > >>> > > >> > > > original spec). > > > >>> > > >> > > > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is > > the > > > >>> next > > > >>> > > >> > > lock > > > >>> > > >> > > operation in order. Since lock ops need to be strictly > > > >>> ordered, > > > >>> > > >> allowing > > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) > > makes no > > > >>> sense. > > > >>> > > >> > > > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or > greater > > > >>> fails, but > > > >>> > > >> it > > > >>> > > >> > > was poorly worded. > > > >>> > > >> > > > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > > (There > > > >>> is an > > > >>> > > >> archive > > > >>> > > >> > > of it somewhere, but I can't remember where.;-) > > > >>> > > >> > > > > > >>> > > >> > > rick > > > >>> > > >> > > _______________________________________________ > > > >>> > > >> > > freebsd-fs@freebsd.org mailing list > > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > > >> > > To unsubscribe, send any mail to > > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > > >>> > > >> > > > > > >>> > > >> > > > > >>> > > >> > > > >>> > > > > > > >>> > > > > > > >>> > > _______________________________________________ > > > >>> > > freebsd-fs@freebsd.org mailing list > > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org > > > >>> " > > > >>> > > > > >>> > -- > > > >>> > > > > >>> > > ------------------------------------------------------------------------- > > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) > > 624-5040 > > > >>> > School of Physics and Astronomy - University of Minnesota > > > >>> > > > > >>> > > ------------------------------------------------------------------------- > > > >>> > _______________________________________________ > > > >>> > freebsd-fs@freebsd.org mailing list > > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org" > > > >>> > > > > >>> _______________________________________________ > > > >>> freebsd-fs@freebsd.org mailing list > > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > " > > > >>> > > > >> > > > >> > > > > > > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMsF%2BO%2B0ObAR6L52kjB6vucaAM9aaHHmjUazQ%2B74XKES9w>