Date: Tue, 28 Jul 2015 14:47:35 +0200 From: Ahmed Kamal <email.ahmedkamal@googlemail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Graham Allan <allan@physics.umn.edu>, Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w@mail.gmail.com> In-Reply-To: <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <CANzjMX4NmxBErtEu=e5yEGJ6gAJBF4_ar_aPdNDO2-tUcePqTQ@mail.gmail.com> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> <CANzjMX48F1gAVwqq64q=yALfTBNEc7iMbKAK1zi6aUfoF3WpOw@mail.gmail.com> <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi again Rick, Seems that I'm still being unlucky with nfs :/ I caught one of the newly installed RHEL6 boxes having high CPU usage, and bombarding the BSD NFS box with 10Mbps traffic .. I caught a tcpdump as you mentioned .. You can download it here: https://dl.dropboxusercontent.com/u/51939288/nfs41-high-client-cpu.pcap.bz2 I didn't restart the client yet .. so if you catch me in the next few hours and want me to run any diagnostics, let me know. Thanks a lot all for helping On Thu, Jul 23, 2015 at 11:59 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Ahmed Kamal wrote: > > Can you please let me know the ultimate packet trace command I'd need to > > run in case of any nfs4 troubles .. I guess this should be comprehensive > > even at the expense of a larger output size (which we can trim later).. > > Thanks a lot for the help! > > > tcpdump -s 0 -w <file>.pcap host <client-host-name> > (<file> refers to a file name you choose and <client-host-name> refers to > the host name of a client generating traffic.) > --> But you won't be able to allow this to run for long during the storm > or the > file will be huge. > > Then you look at <file>.pcap in wireshark, which knows NFS. > > rick > > > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem <rmacklem@uoguelph.ca> > wrote: > > > > > Graham Allan wrote: > > > > For our part, the user whose code triggered the pathological > behaviour > > > > on SL5 reran it on SL6 without incident - I still see lots of > > > > sequence-id errors in the logs, but nothing bad happened. > > > > > > > > I'd still like to ask them to rerun again on SL5 to see if the > "accept > > > > skipped seqid" patch had any effect, though I think we expect not. > Maybe > > > > it would be nice if I could get set up to capture rolling tcpdumps of > > > > the nfs traffic before they run that though... > > > > > > > > Graham > > > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > > > > Hi folks, > > > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep an eye > on it > > > > > to see what happens. > > > > > > > > > > During the process, I made the (I guess mistake) of zfs send | > recv to > > > a > > > > > locally attached usb disk for backup purposes .. long story short, > > > > > sharenfs property on the received filesystem was causing some > > > nfs/mountd > > > > > errors in logs .. I wasn't too happy with what I got .. I > destroyed the > > > > > backup datasets and the whole pool eventually .. and then rebooted > the > > > > > whole nas box .. After reboot my logs are still flooded with > > > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > > > Jul 21 05:13:07 nas last message repeated 7536 times > > > > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > > > > > > > Not sure what that means .. or how it can be stopped .. Anyway, > will > > > > > keep you posted on progress. > > > > > > > Oh, I didn't see the part about "reboot" before. Unfortunately, it > sounds > > > like the > > > client isn't recovering after the session is lost. When the server > > > reboots, the > > > client(s) will get NFS4ERR_BAD_SESSION errors back because the server > > > reboot has > > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state > > > recovery on the client. > > > (It doesn't sound like the clients went into recovery, starting with a > > > Create_session > > > operation, but without a packet trace, I can't be sure?) > > > > > > rick > > > > > > > > > > > -- > > > > > ------------------------------------------------------------------------- > > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu > > > > School of Physics and Astronomy - University of Minnesota > > > > > ------------------------------------------------------------------------- > > > > > > > > > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANzjMX5Q4TNLBxrAm6R2F6oUdfgRD8dX1LRZiniJA4M4HTN_=w>