From owner-freebsd-fs@freebsd.org  Wed Jul 29 11:51:28 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CCB2A9AB29A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Wed, 29 Jul 2015 11:51:28 +0000 (UTC)
 (envelope-from email.ahmedkamal@googlemail.com)
Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com
 [IPv6:2a00:1450:400c:c05::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5784FD46
 for <freebsd-fs@freebsd.org>; Wed, 29 Jul 2015 11:51:28 +0000 (UTC)
 (envelope-from email.ahmedkamal@googlemail.com)
Received: by wibud3 with SMTP id ud3so217222357wib.1
 for <freebsd-fs@freebsd.org>; Wed, 29 Jul 2015 04:51:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=googlemail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=JMLZBAfVLiDXHqRVzOZfEfwjWo7gg8HuMj1kdB3c/2A=;
 b=H/pZToaIT8A2c2nyHh4HJHln5bWSMGAzYyRcb7tSUwEyJSNS7EvAQAB1GeoWEgvOYS
 KEdzDHbwN5fNnYb5gWahxn+ig9jzp2gv6AjJZcpgClgpg1/iZOyVALiSZTEbjMNuosUP
 s5Qnsl5kPkyO3/0srEDNaR6y21yhB+RbQcvsFwnsOs1elj3ecLf5uXLxvmHErv9MmuV4
 9Jsgu8h5Jse8UUdQJzs8WkloJNZOkAdMKU44PGSuNfuuKL+trmSi3kq7C0qmii9aN2u6
 7e94HzHUieHGctMAZU2zU+ZmbtVs8wTh69tV5HT05IPIzG2OxleBCwf3u1ie7KPtqUPW
 aGrQ==
X-Received: by 10.180.21.244 with SMTP id y20mr16751363wie.65.1438170686789;
 Wed, 29 Jul 2015 04:51:26 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.6.143 with HTTP; Wed, 29 Jul 2015 04:51:07 -0700 (PDT)
In-Reply-To: <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca>
References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca>
 <55B12EB7.6030607@physics.umn.edu>
 <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com>
 <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com>
 <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca>
 <CANzjMX7J7wm4f8nCXqjGe7F+ZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com>
 <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca>
From: Ahmed Kamal <email.ahmedkamal@googlemail.com>
Date: Wed, 29 Jul 2015 13:51:07 +0200
Message-ID: <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv+C9VdPtMMTew@mail.gmail.com>
Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!)
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Graham Allan <allan@physics.umn.edu>, 
 Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jul 2015 11:51:29 -0000

hmm Thanks Rick ..

You mentioned the error appears when nfsrvboottime != clientid.lval[0] .. I
understand nfsrvboottime is number of seconds since the epoch (1970) .. Can
you please explain what clientid.lval[0] is, and (if it comes from the
client?) what guarantees it should equal nfsrvboottime ?

Apart from trying to understand the problem. Can you send me a small c
program that runs the same code that computes nfsrvboottime and writes that
to the terminal window. I would like to avoid testing a kernel patch on
this system since it runs in production. And last time I rebooted the nfs
server, I ended up having to reboot all clients (every single workstation)
so that was painful .. So if we just want to know if the number if bigger
than 4 billion or not, I think this small app can help us get this value
right ?

On Wed, Jul 29, 2015 at 1:35 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Ahmed Kamal wrote:
> > hmm, if I understand you correctly, this time_seconds value is the number
> > of seconds till the box booted ?
> No, it is the number of seconds when the box booted. Once, it was supposed
> to
> be the number of seconds since Jan. 1, 1970, but I don't if that is still
> the
> case. (For my i386, it is about 1.4billion when it boots, so I'd guess they
> still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother
> with
> leap years, etc.)
>
> Now, I don't know if the clock could somehow set it to a value > 4billion
> when
> the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up),
> but the
> current clock code is pretty convoluted stuff, so??
>
> rick
> ps: From the NFSv4 server's point of view, it only needs a number that is
> unique and
>     changes every time the server reboots. As such, using the low order
> 32bits of
>     it would be sufficient, even if it exceeds 4billion. However, the code
> incorrectly
>     assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the
> patch.
>
> > If so, I guess this is not really the
> > cause of what we're seeing as the box is only up for 8 days
> >
> > bsd# uptime
> > 11:28AM  up 8 days,  6:20, 6 users, load averages: 0.94, 0.91, 0.84
> >
> > The NFS client box's uptime is
> > linux# uptime
> >  11:31:39 up 8 days,  5:51, 11 users,  load average: 87.74, 87.43, 87.35
> >
> > and yes the huge load is most likely due to this NFS bug
> >
> > On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> >
> > > Ahmed Kamal wrote:
> > > > Hi again Rick,
> > > >
> > > > Seems that I'm still being unlucky with nfs :/ I caught one of the
> newly
> > > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD
> NFS
> > > box
> > > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can
> > > > download it here:
> > > >
> > > >
> > >
> https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2
> > > >
> > > Ok, the packet trace suggests that the NFSv4 server is broken (it is
> > > replying
> > > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID).
> > > Now, I can't be sure, but the only explanation I can come up with is...
> > > - For some arches (I only have i386, so I wouldn't have seen this
> during
> > > testing),
> > >   time_t is 64bits (uint64_t).
> > >   --> If time_seconds somehow doesn't fit in the low order 32bits, then
> > > the code
> > >       would be busted for these arches because nfsrvboottime is set to
> > > time_seconds
> > >       when the server is started and then there are comparisons like:
> > >       if (nfsrvboottime != clientid.lval[0])
> > >            return (NFSERR_STALECLIENTID);
> > >        /* where clientid.lval[0] is a uint32_t */
> > > Anyhow, if this is what is happening, the attached simple patch should
> fix
> > > it.
> > > (I don't know how time_seconds would exceed 4billion, but the clock
> code is
> > >  pretty convoluted, so I can't say if it can possibly happen?)
> > >
> > > rick
> > > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can
> exceed
> > >     4*1024*1024*1024 - 1 on amd64?
> > >
> > > > I didn't restart the client yet .. so if you catch me in the next few
> > > hours
> > > > and want me to run any diagnostics, let me know. Thanks a lot all for
> > > > helping
> > > >
> > > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca
> >
> > > wrote:
> > > >
> > > > > Ahmed Kamal wrote:
> > > > > > Can you please let me know the ultimate packet trace command I'd
> > > need to
> > > > > > run in case of any nfs4 troubles .. I guess this should be
> > > comprehensive
> > > > > > even at the expense of a larger output size (which we can trim
> > > later)..
> > > > > > Thanks a lot for the help!
> > > > > >
> > > > > tcpdump -s 0 -w <file>.pcap host <client-host-name>
> > > > > (<file> refers to a file name you choose and <client-host-name>
> refers
> > > to
> > > > >  the host name of a client generating traffic.)
> > > > > --> But you won't be able to allow this to run for long during the
> > > storm
> > > > > or the
> > > > >     file will be huge.
> > > > >
> > > > > Then you look at <file>.pcap in wireshark, which knows NFS.
> > > > >
> > > > > rick
> > > > >
> > > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <
> rmacklem@uoguelph.ca
> > > >
> > > > > wrote:
> > > > > >
> > > > > > > Graham Allan wrote:
> > > > > > > > For our part, the user whose code triggered the pathological
> > > > > behaviour
> > > > > > > > on SL5 reran it on SL6 without incident - I still see lots of
> > > > > > > > sequence-id errors in the logs, but nothing bad happened.
> > > > > > > >
> > > > > > > > I'd still like to ask them to rerun again on SL5 to see if
> the
> > > > > "accept
> > > > > > > > skipped seqid" patch had any effect, though I think we expect
> > > not.
> > > > > Maybe
> > > > > > > > it would be nice if I could get set up to capture rolling
> > > tcpdumps of
> > > > > > > > the nfs traffic before they run that though...
> > > > > > > >
> > > > > > > > Graham
> > > > > > > >
> > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote:
> > > > > > > > > Hi folks,
> > > > > > > > >
> > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep
> an
> > > eye
> > > > > on it
> > > > > > > > > to see what happens.
> > > > > > > > >
> > > > > > > > > During the process, I made the (I guess mistake) of zfs
> send |
> > > > > recv to
> > > > > > > a
> > > > > > > > > locally attached usb disk for backup purposes .. long story
> > > short,
> > > > > > > > > sharenfs property on the received filesystem was causing
> some
> > > > > > > nfs/mountd
> > > > > > > > > errors in logs .. I wasn't too happy with what I got .. I
> > > > > destroyed the
> > > > > > > > > backup datasets and the whole pool eventually .. and then
> > > rebooted
> > > > > the
> > > > > > > > > whole nas box .. After reboot my logs are still flooded
> with
> > > > > > > > >
> > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times
> > > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times
> > > > > > > > >
> > > > > > > > > Not sure what that means .. or how it can be stopped ..
> Anyway,
> > > > > will
> > > > > > > > > keep you posted on progress.
> > > > > > > >
> > > > > > > Oh, I didn't see the part about "reboot" before.
> Unfortunately, it
> > > > > sounds
> > > > > > > like the
> > > > > > > client isn't recovering after the session is lost. When the
> server
> > > > > > > reboots, the
> > > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the
> > > server
> > > > > > > reboot has
> > > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger
> state
> > > > > > > recovery on the client.
> > > > > > > (It doesn't sound like the clients went into recovery, starting
> > > with a
> > > > > > > Create_session
> > > > > > >  operation, but without a packet trace, I can't be sure?)
> > > > > > >
> > > > > > > rick
> > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > >
> > >
> -------------------------------------------------------------------------
> > > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu
> > > > > > > > School of Physics and Astronomy - University of Minnesota
> > > > > > > >
> > > > >
> > >
> -------------------------------------------------------------------------
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>