Date: Thu, 30 Oct 2014 21:49:47 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Garrett Wollman <wollman@csail.mit.edu> Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Definite NFS bug Message-ID: <928219131.2682604.1414720187244.JavaMail.root@uoguelph.ca> In-Reply-To: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
I wrote: > Garrett Wollman wrote: > > Like many other users, I upgrade my FreeBSD servers by NFS-mounting > > /usr/src and /usr/obj from a shared build server.[1] Since I > > upgraded > > the build server to 9.3, clients running 9.3 kernels have been > > randomly erroring out during installkernel and installworld. Today > > I > > had some time to look more closely into this and found that the > > error > > is definitely coming from the server: at some point, it just > > randomly > > starts returning errors to client ACCESS and GETATTR operations. > > The > > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is > > nothing > > on the server to indicate any kind of error, and restarting the > > operation on the client causes it to fail in a different place. > > With > > enough patients and restarts, it's possible to complete the > > installation in just four or five passes. > > > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > > clients don't see this issue at all; it's only 9.3 clients that > > break. > > > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > > >/dev/null'. > > It does not seem to depend on the client NFS version (3 or 4) or > > implementation ("old" or "new"). I haven't tried the "old" server > > yet > > -- I'll need to figure out how to do that first. > > Oh, and it wasn't clear to me if you are seeing this on a 9.3 server only? (If you get the same outcome testing against an older server, then it seems it is a client side issue.) If that is the case, I'd suggest you try a pre-r261056 (one of the changes was r261056, not r261057) stable/9 kernel. At a closer look, most of the kernel rpc changes are for the server side. (Most of the client side commits just change the copyright, but there are a couple of client side changes beyond that.) > Well, I took a quick look and, if I got it correct, there is one > single > line change in the "old" client between 9.2 and 9.3, which defined > an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is > only used by the new client when "nocontigwr" is specified.) > > However, there was some fairly extensive changes done (mostly by > mav@) > to the kernel rpc (sys/rpc), which is used by both clients and both > servers. > Most of these changes were committed to stable/9 as r261057, r261058. > If you could build a kernel from stable/9 just prior to r261057 and > see > if that client runs into the problem, it could help determine if > these > changes are causing the problem. > Alternately, running the 9.3 system with a 9.2 sys/rpc (if it > links/runs), > that could also help see if the kernel rpc is the culprit. (You can > load the kernel rpc as a module, but it's linked into most kernels.) > > If it doesn't turn out to be in the kernel rpc, my next guess would > be changes to the net device driver (to check for this you could use > a different type of hardware device or the 9.2 driver on the 9.3 > system. maybe?). > > The "new" client has some changes 9.2->9.3, but since nothing changed > for the "old" client and you see the problem with the "old" one, I > think the NFS client is not the culprit. > > rick > > > If anyone is willing to help debug this, I can share a packet > > trace, > > but I don't think it's very informative. Also, if anyone has a > > good > > dtrace script that I could run on the server that would report > > what's > > going on when that first NFS3ERR_IO is returned, that would be > > great. > > > > -GAWollman > > > > [1] I'd run my own freebsd-update server but unfortunately it is > > too > > tied to building things that look like official FreeBSD security > > updates, and isn't really designed for (e.g.) updating kernels when > > we > > change a configuration option. It also doesn't have any obvious > > knobs > > for building with anything other than a default {make,src}.conf. > > And with a pkg-able base just around the corner I don't really want > > to > > put much effort into making freebsd-update do what I want. NFS, on > > the other hand, is a big deal and so I need to track down and fix > > these bugs. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?928219131.2682604.1414720187244.JavaMail.root>