Date: Fri, 17 Jul 2015 11:05:34 -0500 From: Graham Allan <allan@physics.umn.edu> To: Ahmed Kamal <email.ahmedkamal@googlemail.com> Cc: Ahmed Kamal via freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) Message-ID: <55A927CE.5010505@physics.umn.edu> In-Reply-To: <CANzjMX43dsKkdvnnBaX5qsb2XbHpRKftRKyZ8QrZkAaR2wFVVg@mail.gmail.com> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <CANzjMX7xKBvnzJhQhB_ZrUnyE2m_FJXXy4fm_RFnuZfBDyDm2A@mail.gmail.com> <55947C6E.5060409@delphij.net> <1491630362.2785531.1435799383802.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <CANzjMX5eN1FsnHMf6KGZe_b3vwxxF=dy3fJUHxeGO4BXuNzfPA@mail.gmail.com> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <CANzjMX427XNQJ1o6Wh2CVy1LF1ivspGcfNeRCmv%2BOyApK2UhJg@mail.gmail.com> <CANzjMX5xyUz6OkMKS4O-MrV2w58YT9ricOPLJWVtAR5Ci-LMew@mail.gmail.com> <20150716235022.GF32479@physics.umn.edu> <CANzjMX43dsKkdvnnBaX5qsb2XbHpRKftRKyZ8QrZkAaR2wFVVg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I have maintenance scheduled this weekend so maybe I will try to add Xin Li's patch on one of our 9.3 servers and can see if the sequence-id messages diminish (even though it didn't help for you - possibly SL6 will behave differently). As for SL6/NFS4 being more tolerant, I suspect the problem is dependent on the specific job. This is the first time I have seen it at all (that is, with the stuck processes and high rpciod load), and I only see one person running this code. Although looking back ~60 days in logs I can see the sequence-id messages occurring all over the place from other machines, apparently without incident. For the more intense users who are running on 200 servers at once, I wonder if they are not hitting the NFS server in the same way - possibly they are mostly writing somewhere else like hadoop and only reading from NFS. However our compute farm conversions to SL6 and NFSv4 are fairly recent, so something may yet show up. I wonder if we have any avenue to file a bug with Redhat. I have a very basic subscription which only lets me look at their KB, but I could upgrade it - but then, as I'm running a clone product I probably don't have a viable report. Graham On 7/17/2015 6:21 AM, Ahmed Kamal wrote: > Hi Graham, > > So my RHEL5 boxes certainly have trouble with nfs4 .. I'm running about > 20 boxes and almost all of them develop a choking process every day or > two. I'm now in the process of upgrading our RHEL boxes to v6.x .. This > is motivated to migrate to NFS4.1, although now that you say NFS4 is > more tolerant on EL6, I might just remain on that. So far I did one week > of basic testing of a VM on el6 with nfs4.1 vs my FreeBSD 10.1, so far I > didn't hit problems (although the testing was light). Next week, I'll > probably upgrade one of our production machines to el6 and see how it fares. > > PS: I had upgraded our el5 box with elrepo kernel (v3.2) .. which I > thought would be way much newer (even newer than el6) .. But I still had > trouble with it .. so I reverted to stock el5 kernel! Not sure if this > means Linux is not the only component at fault ?! > -- ------------------------------------------------------------------------- Graham Allan - gta@umn.edu - allan@physics.umn.edu School of Physics and Astronomy - University of Minnesota -------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A927CE.5010505>