FreeBSD Mail Archives

Date:      Wed, 29 Jul 2015 18:47:42 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Ahmed Kamal <email.ahmedkamal@googlemail.com>
Cc:        Graham Allan <allan@physics.umn.edu>,  Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Linux NFSv4 clients are getting (bad sequence-id error!)
Message-ID:  <399737523.5288680.1438210062715.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew@mail.gmail.com>
References:  <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com> <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca> <CANzjMX7J7wm4f8nCXqjGe7F%2BZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com> <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca> <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew@mail.gmail.com>


[-- Attachment #1 --]
Ahmed Kamal wrote:
> hmm Thanks Rick ..
> 
> You mentioned the error appears when nfsrvboottime != clientid.lval[0] .. I
> understand nfsrvboottime is number of seconds since the epoch (1970) .. Can
> you please explain what clientid.lval[0] is, and (if it comes from the
> client?) what guarantees it should equal nfsrvboottime ?
> 
> Apart from trying to understand the problem. Can you send me a small c
> program that runs the same code that computes nfsrvboottime and writes that
> to the terminal window. I would like to avoid testing a kernel patch on
> this system since it runs in production. And last time I rebooted the nfs
> server, I ended up having to reboot all clients (every single workstation)
> so that was painful .. So if we just want to know if the number if bigger
> than 4 billion or not, I think this small app can help us get this value
> right ?
> 
Ok, I took a closer look at the packet trace (thanks for creating it) and I saw
that the clientid value being returned by the server was bogus. (The field that
shouldn't change except when the server is rebooted was changing.)

Given the above, I think I did find the bug and the attached patch should fix it.
(This is NFSv4.1 specific and has nothing to do with the NFSv4.0 seqid issue.)
The patch actually fixes 3 things, although I don't think the other 2 would affect
you in practice:
1 - When a confirmed clientid already exists, nfsrv_setclient() wasn't setting the
    clientidp argument, so the reply included garbage off the stack.
    I think this is what caused your problem.
2 - If the high order 32bits of the nfsrvboottime is non-zero, the comparisons with
    clientid.lval[0] would fail. I don't think this should actually happen until the
    year 2138, but is fixed in the patch.
3 - It was possible to leave an unconfirmed duplicate clientid structure in the list
    when the match in nfsrv_setclient() found it in the last hash list. I do not think
    this would have caused any problem, since the new one would be at the head of the
    list. The old one would eventually been scavenged and cleared out, although it
    would have occupied storage until then.

The attached patch fixes all of these and I think will fix your problem. Maybe you
can create a patched kernel and then find a time to reboot the server someday?

If any other reader is using the NFSv4 server, please test this patch if possible.

Thanks for creating the packet trace and sorry about this bug causing you grief, rick
ps: I'd guess you are one of the first serious users of the NFSv4.1 server, but hopefully
    it will behave ok for you with this patch.


> On Wed, Jul 29, 2015 at 1:35 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> 
> > Ahmed Kamal wrote:
> > > hmm, if I understand you correctly, this time_seconds value is the number
> > > of seconds till the box booted ?
> > No, it is the number of seconds when the box booted. Once, it was supposed
> > to
> > be the number of seconds since Jan. 1, 1970, but I don't if that is still
> > the
> > case. (For my i386, it is about 1.4billion when it boots, so I'd guess they
> > still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother
> > with
> > leap years, etc.)
> >
> > Now, I don't know if the clock could somehow set it to a value > 4billion
> > when
> > the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up),
> > but the
> > current clock code is pretty convoluted stuff, so??
> >
> > rick
> > ps: From the NFSv4 server's point of view, it only needs a number that is
> > unique and
> >     changes every time the server reboots. As such, using the low order
> > 32bits of
> >     it would be sufficient, even if it exceeds 4billion. However, the code
> > incorrectly
> >     assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the
> > patch.
> >
> > > If so, I guess this is not really the
> > > cause of what we're seeing as the box is only up for 8 days
> > >
> > > bsd# uptime
> > > 11:28AM  up 8 days,  6:20, 6 users, load averages: 0.94, 0.91, 0.84
> > >
> > > The NFS client box's uptime is
> > > linux# uptime
> > >  11:31:39 up 8 days,  5:51, 11 users,  load average: 87.74, 87.43, 87.35
> > >
> > > and yes the huge load is most likely due to this NFS bug
> > >
> > > On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > >
> > > > Ahmed Kamal wrote:
> > > > > Hi again Rick,
> > > > >
> > > > > Seems that I'm still being unlucky with nfs :/ I caught one of the
> > newly
> > > > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD
> > NFS
> > > > box
> > > > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can
> > > > > download it here:
> > > > >
> > > > >
> > > >
> > https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2
> > > > >
> > > > Ok, the packet trace suggests that the NFSv4 server is broken (it is
> > > > replying
> > > > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID).
> > > > Now, I can't be sure, but the only explanation I can come up with is...
> > > > - For some arches (I only have i386, so I wouldn't have seen this
> > during
> > > > testing),
> > > >   time_t is 64bits (uint64_t).
> > > >   --> If time_seconds somehow doesn't fit in the low order 32bits, then
> > > > the code
> > > >       would be busted for these arches because nfsrvboottime is set to
> > > > time_seconds
> > > >       when the server is started and then there are comparisons like:
> > > >       if (nfsrvboottime != clientid.lval[0])
> > > >            return (NFSERR_STALECLIENTID);
> > > >        /* where clientid.lval[0] is a uint32_t */
> > > > Anyhow, if this is what is happening, the attached simple patch should
> > fix
> > > > it.
> > > > (I don't know how time_seconds would exceed 4billion, but the clock
> > code is
> > > >  pretty convoluted, so I can't say if it can possibly happen?)
> > > >
> > > > rick
> > > > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can
> > exceed
> > > >     4*1024*1024*1024 - 1 on amd64?
> > > >
> > > > > I didn't restart the client yet .. so if you catch me in the next few
> > > > hours
> > > > > and want me to run any diagnostics, let me know. Thanks a lot all for
> > > > > helping
> > > > >
> > > > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca
> > >
> > > > wrote:
> > > > >
> > > > > > Ahmed Kamal wrote:
> > > > > > > Can you please let me know the ultimate packet trace command I'd
> > > > need to
> > > > > > > run in case of any nfs4 troubles .. I guess this should be
> > > > comprehensive
> > > > > > > even at the expense of a larger output size (which we can trim
> > > > later)..
> > > > > > > Thanks a lot for the help!
> > > > > > >
> > > > > > tcpdump -s 0 -w <file>.pcap host <client-host-name>
> > > > > > (<file> refers to a file name you choose and <client-host-name>
> > refers
> > > > to
> > > > > >  the host name of a client generating traffic.)
> > > > > > --> But you won't be able to allow this to run for long during the
> > > > storm
> > > > > > or the
> > > > > >     file will be huge.
> > > > > >
> > > > > > Then you look at <file>.pcap in wireshark, which knows NFS.
> > > > > >
> > > > > > rick
> > > > > >
> > > > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <
> > rmacklem@uoguelph.ca
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Graham Allan wrote:
> > > > > > > > > For our part, the user whose code triggered the pathological
> > > > > > behaviour
> > > > > > > > > on SL5 reran it on SL6 without incident - I still see lots of
> > > > > > > > > sequence-id errors in the logs, but nothing bad happened.
> > > > > > > > >
> > > > > > > > > I'd still like to ask them to rerun again on SL5 to see if
> > the
> > > > > > "accept
> > > > > > > > > skipped seqid" patch had any effect, though I think we expect
> > > > not.
> > > > > > Maybe
> > > > > > > > > it would be nice if I could get set up to capture rolling
> > > > tcpdumps of
> > > > > > > > > the nfs traffic before they run that though...
> > > > > > > > >
> > > > > > > > > Graham
> > > > > > > > >
> > > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote:
> > > > > > > > > > Hi folks,
> > > > > > > > > >
> > > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep
> > an
> > > > eye
> > > > > > on it
> > > > > > > > > > to see what happens.
> > > > > > > > > >
> > > > > > > > > > During the process, I made the (I guess mistake) of zfs
> > send |
> > > > > > recv to
> > > > > > > > a
> > > > > > > > > > locally attached usb disk for backup purposes .. long story
> > > > short,
> > > > > > > > > > sharenfs property on the received filesystem was causing
> > some
> > > > > > > > nfs/mountd
> > > > > > > > > > errors in logs .. I wasn't too happy with what I got .. I
> > > > > > destroyed the
> > > > > > > > > > backup datasets and the whole pool eventually .. and then
> > > > rebooted
> > > > > > the
> > > > > > > > > > whole nas box .. After reboot my logs are still flooded
> > with
> > > > > > > > > >
> > > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session
> > > > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times
> > > > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times
> > > > > > > > > >
> > > > > > > > > > Not sure what that means .. or how it can be stopped ..
> > Anyway,
> > > > > > will
> > > > > > > > > > keep you posted on progress.
> > > > > > > > >
> > > > > > > > Oh, I didn't see the part about "reboot" before.
> > Unfortunately, it
> > > > > > sounds
> > > > > > > > like the
> > > > > > > > client isn't recovering after the session is lost. When the
> > server
> > > > > > > > reboots, the
> > > > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the
> > > > server
> > > > > > > > reboot has
> > > > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger
> > state
> > > > > > > > recovery on the client.
> > > > > > > > (It doesn't sound like the clients went into recovery, starting
> > > > with a
> > > > > > > > Create_session
> > > > > > > >  operation, but without a packet trace, I can't be sure?)
> > > > > > > >
> > > > > > > > rick
> > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu
> > > > > > > > > School of Physics and Astronomy - University of Minnesota
> > > > > > > > >
> > > > > >
> > > >
> > -------------------------------------------------------------------------
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

[-- Attachment #2 --]
--- fs/nfsserver/nfs_nfsdstate.c.sav	2015-07-28 18:54:06.561454000 -0400
+++ fs/nfsserver/nfs_nfsdstate.c	2015-07-29 18:07:53.000000000 -0400
@@ -220,7 +220,8 @@ nfsrv_setclient(struct nfsrv_descript *n
 			break;
 		}
 	    }
-	    i++;
+	    if (gotit == 0)
+		i++;
 	}
 	if (!gotit ||
 	    (clp->lc_flags & (LCL_NEEDSCONFIRM | LCL_ADMINREVOKED))) {
@@ -400,9 +401,12 @@ nfsrv_setclient(struct nfsrv_descript *n
 	}
 
 	/* For NFSv4.1, mark that we found a confirmed clientid. */
-	if ((nd->nd_flag & ND_NFSV41) != 0)
+	if ((nd->nd_flag & ND_NFSV41) != 0) {
+		clientidp->lval[0] = clp->lc_clientid.lval[0];
+		clientidp->lval[1] = clp->lc_clientid.lval[1];
+		confirmp->lval[0] = 0;	/* Ignored by client */
 		confirmp->lval[1] = 1;
-	else {
+	} else {
 		/*
 		 * id and verifier match, so update the net address info
 		 * and get rid of any existing callback authentication
@@ -487,7 +491,7 @@ nfsrv_getclient(nfsquad_t clientid, int 
 	if (clpp)
 		*clpp = NULL;
 	if ((nd == NULL || (nd->nd_flag & ND_NFSV41) == 0 ||
-	    opflags != CLOPS_RENEW) && nfsrvboottime != clientid.lval[0]) {
+	    opflags != CLOPS_RENEW) && (uint32_t)nfsrvboottime != clientid.lval[0]) {
 		error = NFSERR_STALECLIENTID;
 		goto out;
 	}
@@ -683,7 +687,7 @@ nfsrv_destroyclient(nfsquad_t clientid, 
 	struct nfsclienthashhead *hp;
 	int error = 0, i, igotlock;
 
-	if (nfsrvboottime != clientid.lval[0]) {
+	if ((uint32_t)nfsrvboottime != clientid.lval[0]) {
 		error = NFSERR_STALECLIENTID;
 		goto out;
 	}
@@ -3996,11 +4000,11 @@ nfsrv_checkrestart(nfsquad_t clientid, u
 	 */
 	if (flags &
 	    (NFSLCK_OPEN | NFSLCK_TEST | NFSLCK_RELEASE | NFSLCK_DELEGPURGE)) {
-		if (clientid.lval[0] != nfsrvboottime) {
+		if (clientid.lval[0] != (uint32_t)nfsrvboottime) {
 			ret = NFSERR_STALECLIENTID;
 			goto out;
 		}
-	} else if (stateidp->other[0] != nfsrvboottime &&
+	} else if (stateidp->other[0] != (uint32_t)nfsrvboottime &&
 		specialid == 0) {
 		ret = NFSERR_STALESTATEID;
 		goto out;

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?399737523.5288680.1438210062715.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation