From owner-freebsd-fs@freebsd.org Tue Jul 21 03:52:41 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6455F9A6A78 for ; Tue, 21 Jul 2015 03:52:41 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DE6B71637 for ; Tue, 21 Jul 2015 03:52:40 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wgav7 with SMTP id v7so79797515wga.2 for ; Mon, 20 Jul 2015 20:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=bfsV2/Ww64o8f9nfYnLH9RCrSIQPm1jcwZkiPveEOKA=; b=R3FAYdVBj82/yQG+zXjlkNFZhFOBZwVBRZnDVVqgzKdrsl1KdIC9p4W9aCp80eRy+h jErfIGfmC8jv6w+e0g3bsgjA3Cp+Wnhmno92Km9EnCqU0B9ag67t3zWhVhBygBO1aIP3 1NEqUAW2WasZX6Hs67nBqWCsL6oUmTwWyEYkob3WJ0Q9bapm78dGog1+BkRKXzC58GOx tuKmdb70gPokdpYwvPIsA6w28txY3GDpY8cues40gTJ5TYvZM7MjCTeQed+z2fgqsff/ 3gWUhmgJGQNze9QKX2eL+mByh8ouIzZGXsP+jFRR8muy+0OEl+Z53J7RpqfQ4b/x5dUE koXA== X-Received: by 10.194.59.98 with SMTP id y2mr63570894wjq.42.1437450759215; Mon, 20 Jul 2015 20:52:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 20:52:19 -0700 (PDT) In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Tue, 21 Jul 2015 05:52:19 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 03:52:41 -0000 More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did the dtrace with: dtrace -n profile-1001 { @[stack()] = count(); } The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) Since rebooting the nfs server didn't fix it .. I imagine I'd have to reboot all NFS clients .. This would be really sad .. Any advice is most appreciated .. Thanks On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < email.ahmedkamal@googlemail.com> wrote: > Hi folks, > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it to > see what happens. > > During the process, I made the (I guess mistake) of zfs send | recv to a > locally attached usb disk for backup purposes .. long story short, sharenfs > property on the received filesystem was causing some nfs/mountd errors in > logs .. I wasn't too happy with what I got .. I destroyed the backup > datasets and the whole pool eventually .. and then rebooted the whole nas > box .. After reboot my logs are still flooded with > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > Jul 21 05:13:07 nas last message repeated 7536 times > Jul 21 05:15:08 nas last message repeated 29664 times > > Not sure what that means .. or how it can be stopped .. Anyway, will keep > you posted on progress. > > On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > wrote: > >> Graham Allan wrote: >> > I'm curious how things are going for you with this? >> > >> > Reading your thread did pique my interest since we have a lot of >> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant >> > to glance through our logs for signs of the same issue, but today I >> > started investigating a machine which appeared to have hung processes, >> > high rpciod load, and high traffic to the NFS server. Of course it is >> > exactly this issue. >> > >> > The affected machine is running SL5 though most of our server nodes are >> > now SL6. I can see errors from most of them but the SL6 systems appear >> > less affected - I see a stream of the sequence-id errors in their logs >> but >> > things in general keep working. The one SL5 machine I'm looking at >> > has a single sequence-id error in today's logs, but then goes into a >> > stream of "state recovery failed" then "Lock reclaim failed". It's >> > probably partly related to the particular workload on this machine. >> > >> > I would try switching our SL6 machines to NFS 4.1 to see if the >> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in >> > 10.1?). >> > >> Btw, I've done some testing against a fairly recent Fedora and haven't >> seen >> the problem. If either of you guys could load a recent Fedora on a test >> client >> box, it would be interesting to see if it suffers from this. (My >> experience is >> that the Fedora distros have more up to date Linux NFS clients.) >> >> rick >> >> > At the NFS servers, most of the sysctl settings are already tuned >> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, >> > 128-256 nfs kernel threads. >> > >> > Graham >> > >> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs >> wrote: >> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >> > > reports from users about hung vnc sessions. So maybe just maybe, linux >> > > clients are able to somehow recover from this bad sequence messages. I >> > > could still see the bad sequence error message in logs though >> > > >> > > Why isn't the highwater tunable set to something better by default ? >> I mean >> > > this server is certainly not under a high or unusual load (it's only >> 40 PCs >> > > mounting from it) >> > > >> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal >> > > > > > > wrote: >> > > >> > > > Thanks all .. I understand now we're doing the "right thing" .. >> Although >> > > > if mounting keeps wedging, I will have to solve it somehow! Either >> using >> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >> > > > >> > > > Regarding Xin's patch, is it possible to build the patched nfsd >> code, as >> > > > a >> > > > kernel module ? I'm looking to minimize my delta to upstream. >> > > > >> > > > Also would adopting Xin's patch and hiding it behind a >> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >> the >> > > > last >> > > > person on earth to hit this) ? >> > > > >> > > > Thanks a lot for all the help! >> > > > >> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem > > >> >> > > > wrote: >> > > > >> > > >> Ahmed Kamal wrote: >> > > >> > Appreciating the fruitful discussion! Can someone please explain >> to >> > > >> > me, >> > > >> > what would happen in the current situation (linux client doing >> this >> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect >> of >> > > >> > that? >> > > >> Well, as you've seen, the Linux client doesn't function correctly >> > > >> against >> > > >> the FreeBSD server (and probably others that don't support this >> > > >> "skip-by-1" >> > > >> case). >> > > >> >> > > >> > What do users see? Any chances of data loss? >> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the >> > > >> Linux >> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the >> guy >> > > >> observing >> > > >> it. >> > > >> >> > > >> > >> > > >> > Also, I find it strange that netapp have acknowledged this is a >> bug on >> > > >> > their side, which has been fixed since then! >> > > >> Yea, I think Netapp screwed up. For some reason their server >> allowed >> > > >> this, >> > > >> then was fixed to not allow it and then someone decided that was >> broken >> > > >> and >> > > >> reversed it. >> > > >> >> > > >> > I also find it strange that I'm the first to hit this :) Is no >> one >> > > >> running >> > > >> > nfs4 yet! >> > > >> > >> > > >> Well, it seems to be slowly catching on. I suspect that the Linux >> client >> > > >> mounting a Netapp is the most common use of it. Since it appears >> that >> > > >> they >> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >> > > >> >> > > >> It may turn out that the Linux client has been fixed or it may >> turn out >> > > >> that most servers allowed this "skip-by-1" even though David >> Noveck (one >> > > >> of the main authors of the protocol) seems to agree with me that it >> > > >> should >> > > >> not be allowed. >> > > >> >> > > >> It is possible that others have bumped into this, but it wasn't >> isolated >> > > >> (I wouldn't have guessed it, so it was good you pointed to the >> RedHat >> > > >> discussion) >> > > >> and they worked around it by reverting to NFSv3 or similar. >> > > >> The protocol is rather complex in this area and changed completely >> for >> > > >> NFSv4.1, >> > > >> so many have also probably moved onto NFSv4.1 where this won't be >> an >> > > >> issue. >> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >> doesn't >> > > >> use >> > > >> these seqid fields.) >> > > >> >> > > >> This is all just mho, rick >> > > >> >> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > >> wrote: >> > > >> > >> > > >> > > Julian Elischer wrote: >> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say. >> > > >> > > > > Please >> > > >> > > > > let me know if Xin Li's patch resolves your problem, even >> though >> > > >> > > > > I >> > > >> > > > > don't believe it is correct except for the UINT32_MAX >> case. Good >> > > >> > > > > luck with it, rick >> > > >> > > > and please keep us all in the loop as to what they say! >> > > >> > > > >> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always >> N+1 in >> > > >> > > > a >> > > >> > > > number field that has a >> > > >> > > > bit of slack at wrap time (probably due to some ambiguity in >> the >> > > >> > > > original spec). >> > > >> > > > >> > > >> > > Actually, since N is the lock op already done, N + 1 is the >> next >> > > >> > > lock >> > > >> > > operation in order. Since lock ops need to be strictly ordered, >> > > >> allowing >> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >> sense. >> > > >> > > >> > > >> > > I think the author of the RFC meant that N + 2 or greater >> fails, but >> > > >> it >> > > >> > > was poorly worded. >> > > >> > > >> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There >> is an >> > > >> archive >> > > >> > > of it somewhere, but I can't remember where.;-) >> > > >> > > >> > > >> > > rick >> > > >> > > _______________________________________________ >> > > >> > > freebsd-fs@freebsd.org mailing list >> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > >> > > To unsubscribe, send any mail to >> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > _______________________________________________ >> > > freebsd-fs@freebsd.org mailing list >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> > -- >> > >> ------------------------------------------------------------------------- >> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 >> > School of Physics and Astronomy - University of Minnesota >> > >> ------------------------------------------------------------------------- >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >