From owner-freebsd-fs@freebsd.org Tue Jul 21 04:51:25 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D7919A72E5 for ; Tue, 21 Jul 2015 04:51:25 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D7491AA7 for ; Tue, 21 Jul 2015 04:51:25 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wicgb10 with SMTP id gb10so43877841wic.1 for ; Mon, 20 Jul 2015 21:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=7QOnTTc4nGbGxfpBgNP6hi1kgFQiMY9F9FQBOE/p1IM=; b=oAgUiTrOJAj0Ie3ajoBZZm4IuNUsd9+7C4vXCbS1QpxQ1e+KpIFCSgshZmnW1YlYyQ FITt2ZSeiXV33BlL2BgEGGf0h5KM6rQPp+NXW1Mfg/JLVeoFCCzORnU4rWNOEb8Pjh3e E/TUc66Bp45r7jMUsv7aQL4Po2zJUelWGKG+3i4ya+sf9kCJ7cc5n2QHIdBtaWExBwUG aKVzxOEOIuSufioG5e+yOG/wnSb8vGSMDY1FDqklWHOwobLPDdyr0kUe63IBWOaan+48 uZ6RUq2bAasp3Rnhtuowpjk8H+S/9JUShHuS3z6YhpgB1L3AnPi8fFb+gtuMeYUnmcMG AWcQ== X-Received: by 10.194.59.98 with SMTP id y2mr63959633wjq.42.1437454283437; Mon, 20 Jul 2015 21:51:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 21:51:03 -0700 (PDT) In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Tue, 21 Jul 2015 06:51:03 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 04:51:25 -0000 rhel6 servers logs were flooded with errors like: http://paste2.org/EwLGcGF6 The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably Linux was retrying too hard ?! I had to reboot all PCs and after the last one, nfsd CPU usage dropped immediately to zero On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < email.ahmedkamal@googlemail.com> wrote: > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did > the dtrace with: > > dtrace -n profile-1001 { @[stack()] = count(); } > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > reboot all NFS clients .. This would be really sad .. Any advice is most > appreciated .. Thanks > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > email.ahmedkamal@googlemail.com> wrote: > >> Hi folks, >> >> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to >> see what happens. >> >> During the process, I made the (I guess mistake) of zfs send | recv to a >> locally attached usb disk for backup purposes .. long story short, sharenfs >> property on the received filesystem was causing some nfs/mountd errors in >> logs .. I wasn't too happy with what I got .. I destroyed the backup >> datasets and the whole pool eventually .. and then rebooted the whole nas >> box .. After reboot my logs are still flooded with >> >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session >> Jul 21 05:13:07 nas last message repeated 7536 times >> Jul 21 05:15:08 nas last message repeated 29664 times >> >> Not sure what that means .. or how it can be stopped .. Anyway, will keep >> you posted on progress. >> >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem >> wrote: >> >>> Graham Allan wrote: >>> > I'm curious how things are going for you with this? >>> > >>> > Reading your thread did pique my interest since we have a lot of >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant >>> > to glance through our logs for signs of the same issue, but today I >>> > started investigating a machine which appeared to have hung processes, >>> > high rpciod load, and high traffic to the NFS server. Of course it is >>> > exactly this issue. >>> > >>> > The affected machine is running SL5 though most of our server nodes are >>> > now SL6. I can see errors from most of them but the SL6 systems appear >>> > less affected - I see a stream of the sequence-id errors in their logs >>> but >>> > things in general keep working. The one SL5 machine I'm looking at >>> > has a single sequence-id error in today's logs, but then goes into a >>> > stream of "state recovery failed" then "Lock reclaim failed". It's >>> > probably partly related to the particular workload on this machine. >>> > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in >>> > 10.1?). >>> > >>> Btw, I've done some testing against a fairly recent Fedora and haven't >>> seen >>> the problem. If either of you guys could load a recent Fedora on a test >>> client >>> box, it would be interesting to see if it suffers from this. (My >>> experience is >>> that the Fedora distros have more up to date Linux NFS clients.) >>> >>> rick >>> >>> > At the NFS servers, most of the sysctl settings are already tuned >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, >>> > 128-256 nfs kernel threads. >>> > >>> > Graham >>> > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs >>> wrote: >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >>> > > reports from users about hung vnc sessions. So maybe just maybe, >>> linux >>> > > clients are able to somehow recover from this bad sequence messages. >>> I >>> > > could still see the bad sequence error message in logs though >>> > > >>> > > Why isn't the highwater tunable set to something better by default ? >>> I mean >>> > > this server is certainly not under a high or unusual load (it's only >>> 40 PCs >>> > > mounting from it) >>> > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal >>> > > >> > > > wrote: >>> > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. >>> Although >>> > > > if mounting keeps wedging, I will have to solve it somehow! Either >>> using >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >>> > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd >>> code, as >>> > > > a >>> > > > kernel module ? I'm looking to minimize my delta to upstream. >>> > > > >>> > > > Also would adopting Xin's patch and hiding it behind a >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >>> the >>> > > > last >>> > > > person on earth to hit this) ? >>> > > > >>> > > > Thanks a lot for all the help! >>> > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> >>> > > > wrote: >>> > > > >>> > > >> Ahmed Kamal wrote: >>> > > >> > Appreciating the fruitful discussion! Can someone please >>> explain to >>> > > >> > me, >>> > > >> > what would happen in the current situation (linux client doing >>> this >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect >>> of >>> > > >> > that? >>> > > >> Well, as you've seen, the Linux client doesn't function correctly >>> > > >> against >>> > > >> the FreeBSD server (and probably others that don't support this >>> > > >> "skip-by-1" >>> > > >> case). >>> > > >> >>> > > >> > What do users see? Any chances of data loss? >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what >>> the >>> > > >> Linux >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the >>> guy >>> > > >> observing >>> > > >> it. >>> > > >> >>> > > >> > >>> > > >> > Also, I find it strange that netapp have acknowledged this is a >>> bug on >>> > > >> > their side, which has been fixed since then! >>> > > >> Yea, I think Netapp screwed up. For some reason their server >>> allowed >>> > > >> this, >>> > > >> then was fixed to not allow it and then someone decided that was >>> broken >>> > > >> and >>> > > >> reversed it. >>> > > >> >>> > > >> > I also find it strange that I'm the first to hit this :) Is no >>> one >>> > > >> running >>> > > >> > nfs4 yet! >>> > > >> > >>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux >>> client >>> > > >> mounting a Netapp is the most common use of it. Since it appears >>> that >>> > > >> they >>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >>> > > >> >>> > > >> It may turn out that the Linux client has been fixed or it may >>> turn out >>> > > >> that most servers allowed this "skip-by-1" even though David >>> Noveck (one >>> > > >> of the main authors of the protocol) seems to agree with me that >>> it >>> > > >> should >>> > > >> not be allowed. >>> > > >> >>> > > >> It is possible that others have bumped into this, but it wasn't >>> isolated >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the >>> RedHat >>> > > >> discussion) >>> > > >> and they worked around it by reverting to NFSv3 or similar. >>> > > >> The protocol is rather complex in this area and changed >>> completely for >>> > > >> NFSv4.1, >>> > > >> so many have also probably moved onto NFSv4.1 where this won't be >>> an >>> > > >> issue. >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >>> doesn't >>> > > >> use >>> > > >> these seqid fields.) >>> > > >> >>> > > >> This is all just mho, rick >>> > > >> >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> > > >> wrote: >>> > > >> > >>> > > >> > > Julian Elischer wrote: >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they >>> say. >>> > > >> > > > > Please >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even >>> though >>> > > >> > > > > I >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX >>> case. Good >>> > > >> > > > > luck with it, rick >>> > > >> > > > and please keep us all in the loop as to what they say! >>> > > >> > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always >>> N+1 in >>> > > >> > > > a >>> > > >> > > > number field that has a >>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity >>> in the >>> > > >> > > > original spec). >>> > > >> > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is the >>> next >>> > > >> > > lock >>> > > >> > > operation in order. Since lock ops need to be strictly >>> ordered, >>> > > >> allowing >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >>> sense. >>> > > >> > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater >>> fails, but >>> > > >> it >>> > > >> > > was poorly worded. >>> > > >> > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There >>> is an >>> > > >> archive >>> > > >> > > of it somewhere, but I can't remember where.;-) >>> > > >> > > >>> > > >> > > rick >>> > > >> > > _______________________________________________ >>> > > >> > > freebsd-fs@freebsd.org mailing list >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > > >> > > To unsubscribe, send any mail to >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" >>> > > >> > > >>> > > >> > >>> > > >> >>> > > > >>> > > > >>> > > _______________________________________________ >>> > > freebsd-fs@freebsd.org mailing list >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>> " >>> > >>> > -- >>> > >>> ------------------------------------------------------------------------- >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 >>> > School of Physics and Astronomy - University of Minnesota >>> > >>> ------------------------------------------------------------------------- >>> > _______________________________________________ >>> > freebsd-fs@freebsd.org mailing list >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> > >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >> >> >