Date: Thu, 23 Jul 2015 17:26:50 +0200 From: Ahmed Kamal <email.ahmedkamal@googlemail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Graham Allan <allan@physics.umn.edu>, Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <CANzjMX6z39CSZgo7ag%2BMhWqF-m2=nC85XqwunBXMYDRYG92qPw@mail.gmail.com> In-Reply-To: <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com> <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com> <CANzjMX5RA4eSQ8sk1n5hG0AaeThDJqw4x7iJu6kQEV_3%2BQAXpQ@mail.gmail.com> <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Well .. The problem is now gone, so I guess I can't collect more data till it happens (or hopefully doesn't :) happen again .. So as I described, I had to restart the FreeBSD NFS server box first .. maybe this caused linux clients to give up after 5 mins, and attempt to destroy the session ? When the NFS server was back up .. It was being bombarded (50Mbps traffic) with rpc traffic, probably saying this "destroy session" message. What I don't understand however is, why doesn't this end. What does FreeBSD reply with? Shouldn't it say, Okay, I don't know anything about this session, so consider it destroyed .. suit yourself linux .. or does it refuse to destroy, causing Linux to keep on retrying like crazy ? On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Ahmed Kamal wrote: > > rhel6 servers logs were flooded with errors like: > http://paste2.org/EwLGcGF6 > > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably > > Linux was retrying too hard ?! I had to reboot all PCs and after the last > > one, nfsd CPU usage dropped immediately to zero > > > Btw, it would be interesting to know what triggers these things (overload > of > the nfs server resulting in very slow response or ???). Basically > Destroy_session > isn't an operation that a client would normally do. I have no idea why the > Linux > client would do it. (A session is what achieves the "exactly once" > semantics for > the RPCs. It should really be in the RPC layer, but the NFSv4 working > group put > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think > of a reason > to destroy a session except on dismount. Maybe if the client thinks the > session is > broken for some reason??) > > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running > repeatedly > (once/sec with timestamps via "date" or similar) so that you can see what > was happening just > before the meltdowns. > > A raw packet trace of just when the meltdown starts would be useful, but I > can't think > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w > <file>.pcap <client-host>" > run for 1sec and then kill/restart it repeatedly with different file > names, so you might get > a useful 1sec capture at the critical time? > > Anyhow, good luck with it, rick > > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just > did > > > the dtrace with: > > > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > > reboot all NFS clients .. This would be really sad .. Any advice is > most > > > appreciated .. Thanks > > > > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com> wrote: > > > > > >> Hi folks, > > >> > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on > it to > > >> see what happens. > > >> > > >> During the process, I made the (I guess mistake) of zfs send | recv > to a > > >> locally attached usb disk for backup purposes .. long story short, > > >> sharenfs > > >> property on the received filesystem was causing some nfs/mountd > errors in > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > > >> datasets and the whole pool eventually .. and then rebooted the whole > nas > > >> box .. After reboot my logs are still flooded with > > >> > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > >> Jul 21 05:13:07 nas last message repeated 7536 times > > >> Jul 21 05:15:08 nas last message repeated 29664 times > > >> > > >> Not sure what that means .. or how it can be stopped .. Anyway, will > keep > > >> you posted on progress. > > >> > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem@uoguelph.ca> > > >> wrote: > > >> > > >>> Graham Allan wrote: > > >>> > I'm curious how things are going for you with this? > > >>> > > > >>> > Reading your thread did pique my interest since we have a lot of > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I > meant > > >>> > to glance through our logs for signs of the same issue, but today I > > >>> > started investigating a machine which appeared to have hung > processes, > > >>> > high rpciod load, and high traffic to the NFS server. Of course it > is > > >>> > exactly this issue. > > >>> > > > >>> > The affected machine is running SL5 though most of our server > nodes are > > >>> > now SL6. I can see errors from most of them but the SL6 systems > appear > > >>> > less affected - I see a stream of the sequence-id errors in their > logs > > >>> but > > >>> > things in general keep working. The one SL5 machine I'm looking at > > >>> > has a single sequence-id error in today's logs, but then goes into > a > > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's > > >>> > probably partly related to the particular workload on this machine. > > >>> > > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is > it in > > >>> > 10.1?). > > >>> > > > >>> Btw, I've done some testing against a fairly recent Fedora and > haven't > > >>> seen > > >>> the problem. If either of you guys could load a recent Fedora on a > test > > >>> client > > >>> box, it would be interesting to see if it suffers from this. (My > > >>> experience is > > >>> that the Fedora distros have more up to date Linux NFS clients.) > > >>> > > >>> rick > > >>> > > >>> > At the NFS servers, most of the sysctl settings are already tuned > > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > > >>> > 128-256 nfs kernel threads. > > >>> > > > >>> > Graham > > >>> > > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via > freebsd-fs > > >>> wrote: > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any > screaming > > >>> > > reports from users about hung vnc sessions. So maybe just maybe, > > >>> linux > > >>> > > clients are able to somehow recover from this bad sequence > messages. > > >>> I > > >>> > > could still see the bad sequence error message in logs though > > >>> > > > > >>> > > Why isn't the highwater tunable set to something better by > default ? > > >>> I mean > > >>> > > this server is certainly not under a high or unusual load (it's > only > > >>> 40 PCs > > >>> > > mounting from it) > > >>> > > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > >>> > > <email.ahmedkamal@googlemail.com > > >>> > > > wrote: > > >>> > > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. > > >>> Although > > >>> > > > if mounting keeps wedging, I will have to solve it somehow! > Either > > >>> using > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > >>> > > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd > > >>> code, as > > >>> > > > a > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > > >>> > > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > not > > >>> the > > >>> > > > last > > >>> > > > person on earth to hit this) ? > > >>> > > > > > >>> > > > Thanks a lot for all the help! > > >>> > > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >>> > > > wrote: > > >>> > > > > > >>> > > >> Ahmed Kamal wrote: > > >>> > > >> > Appreciating the fruitful discussion! Can someone please > > >>> explain to > > >>> > > >> > me, > > >>> > > >> > what would happen in the current situation (linux client > doing > > >>> this > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > effect > > >>> of > > >>> > > >> > that? > > >>> > > >> Well, as you've seen, the Linux client doesn't function > correctly > > >>> > > >> against > > >>> > > >> the FreeBSD server (and probably others that don't support > this > > >>> > > >> "skip-by-1" > > >>> > > >> case). > > >>> > > >> > > >>> > > >> > What do users see? Any chances of data loss? > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess > what > > >>> the > > >>> > > >> Linux > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're > the > > >>> guy > > >>> > > >> observing > > >>> > > >> it. > > >>> > > >> > > >>> > > >> > > > >>> > > >> > Also, I find it strange that netapp have acknowledged this > is a > > >>> bug on > > >>> > > >> > their side, which has been fixed since then! > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > > >>> allowed > > >>> > > >> this, > > >>> > > >> then was fixed to not allow it and then someone decided that > was > > >>> broken > > >>> > > >> and > > >>> > > >> reversed it. > > >>> > > >> > > >>> > > >> > I also find it strange that I'm the first to hit this :) Is > no > > >>> one > > >>> > > >> running > > >>> > > >> > nfs4 yet! > > >>> > > >> > > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the > Linux > > >>> client > > >>> > > >> mounting a Netapp is the most common use of it. Since it > appears > > >>> that > > >>> > > >> they > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably > persisted. > > >>> > > >> > > >>> > > >> It may turn out that the Linux client has been fixed or it may > > >>> turn out > > >>> > > >> that most servers allowed this "skip-by-1" even though David > > >>> Noveck (one > > >>> > > >> of the main authors of the protocol) seems to agree with me > that > > >>> it > > >>> > > >> should > > >>> > > >> not be allowed. > > >>> > > >> > > >>> > > >> It is possible that others have bumped into this, but it > wasn't > > >>> isolated > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the > > >>> RedHat > > >>> > > >> discussion) > > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > > >>> > > >> The protocol is rather complex in this area and changed > > >>> completely for > > >>> > > >> NFSv4.1, > > >>> > > >> so many have also probably moved onto NFSv4.1 where this > won't be > > >>> an > > >>> > > >> issue. > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics > and > > >>> doesn't > > >>> > > >> use > > >>> > > >> these seqid fields.) > > >>> > > >> > > >>> > > >> This is all just mho, rick > > >>> > > >> > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >> wrote: > > >>> > > >> > > > >>> > > >> > > Julian Elischer wrote: > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > > >>> say. > > >>> > > >> > > > > Please > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, > even > > >>> though > > >>> > > >> > > > > I > > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX > > >>> case. Good > > >>> > > >> > > > > luck with it, rick > > >>> > > >> > > > and please keep us all in the loop as to what they say! > > >>> > > >> > > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its > always > > >>> N+1 in > > >>> > > >> > > > a > > >>> > > >> > > > number field that has a > > >>> > > >> > > > bit of slack at wrap time (probably due to some > ambiguity > > >>> in the > > >>> > > >> > > > original spec). > > >>> > > >> > > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is > the > > >>> next > > >>> > > >> > > lock > > >>> > > >> > > operation in order. Since lock ops need to be strictly > > >>> ordered, > > >>> > > >> allowing > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) > makes no > > >>> sense. > > >>> > > >> > > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater > > >>> fails, but > > >>> > > >> it > > >>> > > >> > > was poorly worded. > > >>> > > >> > > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > (There > > >>> is an > > >>> > > >> archive > > >>> > > >> > > of it somewhere, but I can't remember where.;-) > > >>> > > >> > > > > >>> > > >> > > rick > > >>> > > >> > > _______________________________________________ > > >>> > > >> > > freebsd-fs@freebsd.org mailing list > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > >> > > To unsubscribe, send any mail to > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > >>> > > >> > > > > >>> > > >> > > > >>> > > >> > > >>> > > > > > >>> > > > > > >>> > > _______________________________________________ > > >>> > > freebsd-fs@freebsd.org mailing list > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > >>> " > > >>> > > > >>> > -- > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) > 624-5040 > > >>> > School of Physics and Astronomy - University of Minnesota > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > _______________________________________________ > > >>> > freebsd-fs@freebsd.org mailing list > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org" > > >>> > > > >>> _______________________________________________ > > >>> freebsd-fs@freebsd.org mailing list > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > >>> > > >> > > >> > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX6z39CSZgo7ag%2BMhWqF-m2=nC85XqwunBXMYDRYG92qPw>