Date: Wed, 29 Jul 2015 13:51:07 +0200 From: Ahmed Kamal <email.ahmedkamal@googlemail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Graham Allan <allan@physics.umn.edu>, Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew@mail.gmail.com> In-Reply-To: <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com> <1089316279.4709692.1438126760802.JavaMail.zimbra@uoguelph.ca> <CANzjMX7J7wm4f8nCXqjGe7F%2BZgwYFBtFtKTFOzrPJN-MeXJ5EA@mail.gmail.com> <1603742210.4824721.1438169724361.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
hmm Thanks Rick .. You mentioned the error appears when nfsrvboottime != clientid.lval[0] .. I understand nfsrvboottime is number of seconds since the epoch (1970) .. Can you please explain what clientid.lval[0] is, and (if it comes from the client?) what guarantees it should equal nfsrvboottime ? Apart from trying to understand the problem. Can you send me a small c program that runs the same code that computes nfsrvboottime and writes that to the terminal window. I would like to avoid testing a kernel patch on this system since it runs in production. And last time I rebooted the nfs server, I ended up having to reboot all clients (every single workstation) so that was painful .. So if we just want to know if the number if bigger than 4 billion or not, I think this small app can help us get this value right ? On Wed, Jul 29, 2015 at 1:35 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Ahmed Kamal wrote: > > hmm, if I understand you correctly, this time_seconds value is the number > > of seconds till the box booted ? > No, it is the number of seconds when the box booted. Once, it was supposed > to > be the number of seconds since Jan. 1, 1970, but I don't if that is still > the > case. (For my i386, it is about 1.4billion when it boots, so I'd guess they > still use Jan. 1, 1970. 3600*24*365*45 = 1419120000. Yea, I didn't bother > with > leap years, etc.) > > Now, I don't know if the clock could somehow set it to a value > 4billion > when > the nfsd starts (it copies time_seconds to nfsrvboottime as it starts up), > but the > current clock code is pretty convoluted stuff, so?? > > rick > ps: From the NFSv4 server's point of view, it only needs a number that is > unique and > changes every time the server reboots. As such, using the low order > 32bits of > it would be sufficient, even if it exceeds 4billion. However, the code > incorrectly > assumes it won't exceed 4*1024*1024*1024 - 1 unless you apply the > patch. > > > If so, I guess this is not really the > > cause of what we're seeing as the box is only up for 8 days > > > > bsd# uptime > > 11:28AM up 8 days, 6:20, 6 users, load averages: 0.94, 0.91, 0.84 > > > > The NFS client box's uptime is > > linux# uptime > > 11:31:39 up 8 days, 5:51, 11 users, load average: 87.74, 87.43, 87.35 > > > > and yes the huge load is most likely due to this NFS bug > > > > On Wed, Jul 29, 2015 at 1:39 AM, Rick Macklem <rmacklem@uoguelph.ca> > wrote: > > > > > Ahmed Kamal wrote: > > > > Hi again Rick, > > > > > > > > Seems that I'm still being unlucky with nfs :/ I caught one of the > newly > > > > installed RHEL6 boxes having high CPU usage, and bombarding the BSD > NFS > > > box > > > > with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can > > > > download it here: > > > > > > > > > > > > https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2 > > > > > > > Ok, the packet trace suggests that the NFSv4 server is broken (it is > > > replying > > > with NFS4ERR_STALE_CLIENTID for a recently generated ClientID). > > > Now, I can't be sure, but the only explanation I can come up with is... > > > - For some arches (I only have i386, so I wouldn't have seen this > during > > > testing), > > > time_t is 64bits (uint64_t). > > > --> If time_seconds somehow doesn't fit in the low order 32bits, then > > > the code > > > would be busted for these arches because nfsrvboottime is set to > > > time_seconds > > > when the server is started and then there are comparisons like: > > > if (nfsrvboottime != clientid.lval[0]) > > > return (NFSERR_STALECLIENTID); > > > /* where clientid.lval[0] is a uint32_t */ > > > Anyhow, if this is what is happening, the attached simple patch should > fix > > > it. > > > (I don't know how time_seconds would exceed 4billion, but the clock > code is > > > pretty convoluted, so I can't say if it can possibly happen?) > > > > > > rick > > > ps: Hmm, on i386 time_seconds ends up at 1438126486, so maybe it can > exceed > > > 4*1024*1024*1024 - 1 on amd64? > > > > > > > I didn't restart the client yet .. so if you catch me in the next few > > > hours > > > > and want me to run any diagnostics, let me know. Thanks a lot all for > > > > helping > > > > > > > > On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca > > > > > wrote: > > > > > > > > > Ahmed Kamal wrote: > > > > > > Can you please let me know the ultimate packet trace command I'd > > > need to > > > > > > run in case of any nfs4 troubles .. I guess this should be > > > comprehensive > > > > > > even at the expense of a larger output size (which we can trim > > > later).. > > > > > > Thanks a lot for the help! > > > > > > > > > > > tcpdump -s 0 -w <file>.pcap host <client-host-name> > > > > > (<file> refers to a file name you choose and <client-host-name> > refers > > > to > > > > > the host name of a client generating traffic.) > > > > > --> But you won't be able to allow this to run for long during the > > > storm > > > > > or the > > > > > file will be huge. > > > > > > > > > > Then you look at <file>.pcap in wireshark, which knows NFS. > > > > > > > > > > rick > > > > > > > > > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem < > rmacklem@uoguelph.ca > > > > > > > > > wrote: > > > > > > > > > > > > > Graham Allan wrote: > > > > > > > > For our part, the user whose code triggered the pathological > > > > > behaviour > > > > > > > > on SL5 reran it on SL6 without incident - I still see lots of > > > > > > > > sequence-id errors in the logs, but nothing bad happened. > > > > > > > > > > > > > > > > I'd still like to ask them to rerun again on SL5 to see if > the > > > > > "accept > > > > > > > > skipped seqid" patch had any effect, though I think we expect > > > not. > > > > > Maybe > > > > > > > > it would be nice if I could get set up to capture rolling > > > tcpdumps of > > > > > > > > the nfs traffic before they run that though... > > > > > > > > > > > > > > > > Graham > > > > > > > > > > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > > > > > > > > Hi folks, > > > > > > > > > > > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep > an > > > eye > > > > > on it > > > > > > > > > to see what happens. > > > > > > > > > > > > > > > > > > During the process, I made the (I guess mistake) of zfs > send | > > > > > recv to > > > > > > > a > > > > > > > > > locally attached usb disk for backup purposes .. long story > > > short, > > > > > > > > > sharenfs property on the received filesystem was causing > some > > > > > > > nfs/mountd > > > > > > > > > errors in logs .. I wasn't too happy with what I got .. I > > > > > destroyed the > > > > > > > > > backup datasets and the whole pool eventually .. and then > > > rebooted > > > > > the > > > > > > > > > whole nas box .. After reboot my logs are still flooded > with > > > > > > > > > > > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > > > > > > > Jul 21 05:13:07 nas last message repeated 7536 times > > > > > > > > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > > > > > > > > > > > > > > > Not sure what that means .. or how it can be stopped .. > Anyway, > > > > > will > > > > > > > > > keep you posted on progress. > > > > > > > > > > > > > > > Oh, I didn't see the part about "reboot" before. > Unfortunately, it > > > > > sounds > > > > > > > like the > > > > > > > client isn't recovering after the session is lost. When the > server > > > > > > > reboots, the > > > > > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the > > > server > > > > > > > reboot has > > > > > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger > state > > > > > > > recovery on the client. > > > > > > > (It doesn't sound like the clients went into recovery, starting > > > with a > > > > > > > Create_session > > > > > > > operation, but without a packet trace, I can't be sure?) > > > > > > > > > > > > > > rick > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu > > > > > > > > School of Physics and Astronomy - University of Minnesota > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX5N_NiQpnTMXSbaDk4e1XFm4gA8h_UgEv%2BC9VdPtMMTew>