Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Oct 2014 21:49:47 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Garrett Wollman <wollman@csail.mit.edu>
Cc:        freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Definite NFS  bug
Message-ID:  <928219131.2682604.1414720187244.JavaMail.root@uoguelph.ca>
In-Reply-To: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
I wrote:
> Garrett Wollman wrote:
> > Like many other users, I upgrade my FreeBSD servers by NFS-mounting
> > /usr/src and /usr/obj from a shared build server.[1]  Since I
> > upgraded
> > the build server to 9.3, clients running 9.3 kernels have been
> > randomly erroring out during installkernel and installworld.  Today
> > I
> > had some time to look more closely into this and found that the
> > error
> > is definitely coming from the server: at some point, it just
> > randomly
> > starts returning errors to client ACCESS and GETATTR operations.
> >  The
> > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is
> > nothing
> > on the server to indicate any kind of error, and restarting the
> > operation on the client causes it to fail in a different place.
> >  With
> > enough patients and restarts, it's possible to complete the
> > installation in just four or five passes.
> > 
> > Needless to say this is a bit worrying.  Strangely, 9.1 and 9.2
> > clients don't see this issue at all; it's only 9.3 clients that
> > break.
> > 
> > It's easy to reproduce, just 'cd /usr/sc && find . -type f
> > >/dev/null'.
> > It does not seem to depend on the client NFS version (3 or 4) or
> > implementation ("old" or "new").  I haven't tried the "old" server
> > yet
> > -- I'll need to figure out how to do that first.
> > 
Oh, and it wasn't clear to me if you are seeing this on a 9.3 server
only? (If you get the same outcome testing against an older server,
then it seems it is a client side issue.)

If that is the case, I'd suggest you try a pre-r261056 (one of the changes
was r261056, not r261057) stable/9 kernel.

At a closer look, most of the kernel rpc changes are for the server side.
(Most of the client side commits just change the copyright, but there are
 a couple of client side changes beyond that.)

> Well, I took a quick look and, if I got it correct, there is one
> single
> line change in the "old" client between 9.2 and 9.3, which defined
> an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is
> only used by the new client when "nocontigwr" is specified.)
> 
> However, there was some fairly extensive changes done (mostly by
> mav@)
> to the kernel rpc (sys/rpc), which is used by both clients and both
> servers.
> Most of these changes were committed to stable/9 as r261057, r261058.
> If you could build a kernel from stable/9 just prior to r261057 and
> see
> if that client runs into the problem, it could help determine if
> these
> changes are causing the problem.
> Alternately, running the 9.3 system with a 9.2 sys/rpc (if it
> links/runs),
> that could also help see if the kernel rpc is the culprit. (You can
> load the kernel rpc as a module, but it's linked into most kernels.)
> 
> If it doesn't turn out to be in the kernel rpc, my next guess would
> be changes to the net device driver (to check for this you could use
> a different type of hardware device or the 9.2 driver on the 9.3
> system. maybe?).
> 
> The "new" client has some changes 9.2->9.3, but since nothing changed
> for the "old" client and you see the problem with the "old" one, I
> think the NFS client is not the culprit.
> 
> rick
> 
> > If anyone is willing to help debug this, I can share a packet
> > trace,
> > but I don't think it's very informative.  Also, if anyone has a
> > good
> > dtrace script that I could run on the server that would report
> > what's
> > going on when that first NFS3ERR_IO is returned, that would be
> > great.
> > 
> > -GAWollman
> > 
> > [1] I'd run my own freebsd-update server but unfortunately it is
> > too
> > tied to building things that look like official FreeBSD security
> > updates, and isn't really designed for (e.g.) updating kernels when
> > we
> > change a configuration option.  It also doesn't have any obvious
> > knobs
> > for building with anything other than a default {make,src}.conf.
> > And with a pkg-able base just around the corner I don't really want
> > to
> > put much effort into making freebsd-update do what I want.  NFS, on
> > the other hand, is a big deal and so I need to track down and fix
> > these bugs.
> > 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?928219131.2682604.1414720187244.JavaMail.root>