From owner-freebsd-fs@freebsd.org  Tue Jul 21 04:51:25 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D7919A72E5
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 21 Jul 2015 04:51:25 +0000 (UTC)
 (envelope-from email.ahmedkamal@googlemail.com)
Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com
 [IPv6:2a00:1450:400c:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1D7491AA7
 for <freebsd-fs@freebsd.org>; Tue, 21 Jul 2015 04:51:25 +0000 (UTC)
 (envelope-from email.ahmedkamal@googlemail.com)
Received: by wicgb10 with SMTP id gb10so43877841wic.1
 for <freebsd-fs@freebsd.org>; Mon, 20 Jul 2015 21:51:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=googlemail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=7QOnTTc4nGbGxfpBgNP6hi1kgFQiMY9F9FQBOE/p1IM=;
 b=oAgUiTrOJAj0Ie3ajoBZZm4IuNUsd9+7C4vXCbS1QpxQ1e+KpIFCSgshZmnW1YlYyQ
 FITt2ZSeiXV33BlL2BgEGGf0h5KM6rQPp+NXW1Mfg/JLVeoFCCzORnU4rWNOEb8Pjh3e
 E/TUc66Bp45r7jMUsv7aQL4Po2zJUelWGKG+3i4ya+sf9kCJ7cc5n2QHIdBtaWExBwUG
 aKVzxOEOIuSufioG5e+yOG/wnSb8vGSMDY1FDqklWHOwobLPDdyr0kUe63IBWOaan+48
 uZ6RUq2bAasp3Rnhtuowpjk8H+S/9JUShHuS3z6YhpgB1L3AnPi8fFb+gtuMeYUnmcMG
 AWcQ==
X-Received: by 10.194.59.98 with SMTP id y2mr63959633wjq.42.1437454283437;
 Mon, 20 Jul 2015 21:51:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 21:51:03 -0700 (PDT)
In-Reply-To: <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com>
References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca>
 <5594B008.10202@freebsd.org>
 <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com>
 <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv+OyApK2UhJg@mail.gmail.com>
 <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com>
 <20150716235022.GF32479@physics.umn.edu>
 <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com>
 <CANzjMX6e4dZF_pzvujwXB6u8scfzh6Z1nQ8OPLYUmc28hbZvkg@mail.gmail.com>
From: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Date: Tue, 21 Jul 2015 06:51:03 +0200
Message-ID: <CANzjMX5RA4eSQ8sk1n5hG0AaeThDJqw4x7iJu6kQEV_3+QAXpQ@mail.gmail.com>
Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!)
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Graham Allan <allan@physics.umn.edu>, 
 Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jul 2015 04:51:25 -0000

rhel6 servers logs were flooded with errors like: http://paste2.org/EwLGcGF6
The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably
Linux was retrying too hard ?! I had to reboot all PCs and after the last
one, nfsd CPU usage dropped immediately to zero

On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal <
email.ahmedkamal@googlemail.com> wrote:

> More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did
> the dtrace with:
>
> dtrace -n profile-1001 { @[stack()] = count(); }
> The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom)
>
> Since rebooting the nfs server didn't fix it .. I imagine I'd have to
> reboot all NFS clients .. This would be really sad .. Any advice is most
> appreciated .. Thanks
>
>
> On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal <
> email.ahmedkamal@googlemail.com> wrote:
>
>> Hi folks,
>>
>> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to
>> see what happens.
>>
>> During the process, I made the (I guess mistake) of zfs send | recv to a
>> locally attached usb disk for backup purposes .. long story short, sharenfs
>> property on the received filesystem was causing some nfs/mountd errors in
>> logs .. I wasn't too happy with what I got .. I destroyed the backup
>> datasets and the whole pool eventually .. and then rebooted the whole nas
>> box .. After reboot my logs are still flooded with
>>
>> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
>> Jul 21 05:13:07 nas last message repeated 7536 times
>> Jul 21 05:15:08 nas last message repeated 29664 times
>>
>> Not sure what that means .. or how it can be stopped .. Anyway, will keep
>> you posted on progress.
>>
>> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem <rmacklem@uoguelph.ca>
>> wrote:
>>
>>> Graham Allan wrote:
>>> > I'm curious how things are going for you with this?
>>> >
>>> > Reading your thread did pique my interest since we have a lot of
>>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant
>>> > to glance through our logs for signs of the same issue, but today I
>>> > started investigating a machine which appeared to have hung processes,
>>> > high rpciod load, and high traffic to the NFS server. Of course it is
>>> > exactly this issue.
>>> >
>>> > The affected machine is running SL5 though most of our server nodes are
>>> > now SL6. I can see errors from most of them but the SL6 systems appear
>>> > less affected - I see a stream of the sequence-id errors in their logs
>>> but
>>> > things in general keep working. The one SL5 machine I'm looking at
>>> > has a single sequence-id error in today's logs, but then goes into a
>>> > stream of "state recovery failed" then "Lock reclaim failed". It's
>>> > probably partly related to the particular workload on this machine.
>>> >
>>> > I would try switching our SL6 machines to NFS 4.1 to see if the
>>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in
>>> > 10.1?).
>>> >
>>> Btw, I've done some testing against a fairly recent Fedora and haven't
>>> seen
>>> the problem. If either of you guys could load a recent Fedora on a test
>>> client
>>> box, it would be interesting to see if it suffers from this. (My
>>> experience is
>>> that the Fedora distros have more up to date Linux NFS clients.)
>>>
>>> rick
>>>
>>> > At the NFS servers, most of the sysctl settings are already tuned
>>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300,
>>> > 128-256 nfs kernel threads.
>>> >
>>> > Graham
>>> >
>>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs
>>> wrote:
>>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming
>>> > > reports from users about hung vnc sessions. So maybe just maybe,
>>> linux
>>> > > clients are able to somehow recover from this bad sequence messages.
>>> I
>>> > > could still see the bad sequence error message in logs though
>>> > >
>>> > > Why isn't the highwater tunable set to something better by default ?
>>> I mean
>>> > > this server is certainly not under a high or unusual load (it's only
>>> 40 PCs
>>> > > mounting from it)
>>> > >
>>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal
>>> > > <email.ahmedkamal@googlemail.com
>>> > > > wrote:
>>> > >
>>> > > > Thanks all .. I understand now we're doing the "right thing" ..
>>> Although
>>> > > > if mounting keeps wedging, I will have to solve it somehow! Either
>>> using
>>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1.
>>> > > >
>>> > > > Regarding Xin's patch, is it possible to build the patched nfsd
>>> code, as
>>> > > > a
>>> > > > kernel module ? I'm looking to minimize my delta to upstream.
>>> > > >
>>> > > > Also would adopting Xin's patch and hiding it behind a
>>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not
>>> the
>>> > > > last
>>> > > > person on earth to hit this) ?
>>> > > >
>>> > > > Thanks a lot for all the help!
>>> > > >
>>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem <
>>> rmacklem@uoguelph.ca>
>>>
>>> > > > wrote:
>>> > > >
>>> > > >> Ahmed Kamal wrote:
>>> > > >> > Appreciating the fruitful discussion! Can someone please
>>> explain to
>>> > > >> > me,
>>> > > >> > what would happen in the current situation (linux client doing
>>> this
>>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect
>>> of
>>> > > >> > that?
>>> > > >> Well, as you've seen, the Linux client doesn't function correctly
>>> > > >> against
>>> > > >> the FreeBSD server (and probably others that don't support this
>>> > > >> "skip-by-1"
>>> > > >> case).
>>> > > >>
>>> > > >> > What do users see? Any chances of data loss?
>>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what
>>> the
>>> > > >> Linux
>>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the
>>> guy
>>> > > >> observing
>>> > > >> it.
>>> > > >>
>>> > > >> >
>>> > > >> > Also, I find it strange that netapp have acknowledged this is a
>>> bug on
>>> > > >> > their side, which has been fixed since then!
>>> > > >> Yea, I think Netapp screwed up. For some reason their server
>>> allowed
>>> > > >> this,
>>> > > >> then was fixed to not allow it and then someone decided that was
>>> broken
>>> > > >> and
>>> > > >> reversed it.
>>> > > >>
>>> > > >> > I also find it strange that I'm the first to hit this :) Is no
>>> one
>>> > > >> running
>>> > > >> > nfs4 yet!
>>> > > >> >
>>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux
>>> client
>>> > > >> mounting a Netapp is the most common use of it. Since it appears
>>> that
>>> > > >> they
>>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted.
>>> > > >>
>>> > > >> It may turn out that the Linux client has been fixed or it may
>>> turn out
>>> > > >> that most servers allowed this "skip-by-1" even though David
>>> Noveck (one
>>> > > >> of the main authors of the protocol) seems to agree with me that
>>> it
>>> > > >> should
>>> > > >> not be allowed.
>>> > > >>
>>> > > >> It is possible that others have bumped into this, but it wasn't
>>> isolated
>>> > > >> (I wouldn't have guessed it, so it was good you pointed to the
>>> RedHat
>>> > > >> discussion)
>>> > > >> and they worked around it by reverting to NFSv3 or similar.
>>> > > >> The protocol is rather complex in this area and changed
>>> completely for
>>> > > >> NFSv4.1,
>>> > > >> so many have also probably moved onto NFSv4.1 where this won't be
>>> an
>>> > > >> issue.
>>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and
>>> doesn't
>>> > > >> use
>>> > > >>  these seqid fields.)
>>> > > >>
>>> > > >> This is all just mho, rick
>>> > > >>
>>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem <
>>> rmacklem@uoguelph.ca>
>>> > > >> wrote:
>>> > > >> >
>>> > > >> > > Julian Elischer wrote:
>>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote:
>>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they
>>> say.
>>> > > >> > > > > Please
>>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even
>>> though
>>> > > >> > > > > I
>>> > > >> > > > > don't believe it is correct except for the UINT32_MAX
>>> case. Good
>>> > > >> > > > > luck with it, rick
>>> > > >> > > > and please keep us all in the loop as to what they say!
>>> > > >> > > >
>>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always
>>> N+1 in
>>> > > >> > > > a
>>> > > >> > > > number field that has a
>>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity
>>> in the
>>> > > >> > > > original spec).
>>> > > >> > > >
>>> > > >> > > Actually, since N is the lock op already done, N + 1 is the
>>> next
>>> > > >> > > lock
>>> > > >> > > operation in order. Since lock ops need to be strictly
>>> ordered,
>>> > > >> allowing
>>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no
>>> sense.
>>> > > >> > >
>>> > > >> > > I think the author of the RFC meant that N + 2 or greater
>>> fails, but
>>> > > >> it
>>> > > >> > > was poorly worded.
>>> > > >> > >
>>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There
>>> is an
>>> > > >> archive
>>> > > >> > > of it somewhere, but I can't remember where.;-)
>>> > > >> > >
>>> > > >> > > rick
>>> > > >> > > _______________________________________________
>>> > > >> > > freebsd-fs@freebsd.org mailing list
>>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> > > >> > > To unsubscribe, send any mail to
>>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org"
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> > > >
>>> > > >
>>> > > _______________________________________________
>>> > > freebsd-fs@freebsd.org mailing list
>>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org
>>> "
>>> >
>>> > --
>>> >
>>> -------------------------------------------------------------------------
>>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040
>>> > School of Physics and Astronomy - University of Minnesota
>>> >
>>> -------------------------------------------------------------------------
>>> > _______________________________________________
>>> > freebsd-fs@freebsd.org mailing list
>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>> >
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>
>>
>>
>