Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Apr 2015 11:49:47 -0400
From:      J David <j.david.lists@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: FreeBSD 10.1 can't "make -j5 buildworld" over NFS?
Message-ID:  <CABXB=RQCmp2MF-Awf-Qi1MBnU3kKKDSqmZnj_qG33f1qK8st3w@mail.gmail.com>
In-Reply-To: <718753704.19327489.1429107495125.JavaMail.root@uoguelph.ca>
References:  <CABXB=RQFtKYcogL9w9U0_UNuvSN_DMHz-b5=hH_1MJxbYtasTw@mail.gmail.com> <718753704.19327489.1429107495125.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 15, 2015 at 10:18 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Well, the NFS client is almost identical in the two systems. (A couple
> of NFSv4 specific changes and a removal of a redundant check for creation
> of a hard link across mount points are the only ones I can see.)
>
> As such, I'd suspect userland differences. There is a different "make"
> in 10 (which I don't think is in 9.3?), so this would be a good starting
> point.

That may be, but this problem only occurs over NFS.  It does not
happen with local UFS or ZFS.  So perhaps the new make is exercising
the NFS client differently than the old one, revealing the problem.

> Btw, "stale NFS file handle" means that the file has been deleted on the
> server.

Yes it does.  And the make always dies during cleandir, during which
things are being aggressively deleted.

It does seem like that's the *only* stage that has problems.  I.e. if
"make cleanworld" is run before "make -j5 buildworld" then the
parallel build will succeed.  Hopefully that means it will be
relatively easy to narrow down / reproduce the problem behavior.

However, in my experience, stale NFS file handles usually occur when
one client deletes things out from under another client (and/or after
a server reboot, which is not the case here).  In this case, this is
the only client that can even mount the relevant partition as
read-write, much less writing to it.  It's like the 10.1 client is
caching that stuff exists even after it removes it, leading to errors
from the server when it tries to access them again.  It's pretty
unusual (again, in my experience) for a single client to trip over
*itself* when deleting things.

Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RQCmp2MF-Awf-Qi1MBnU3kKKDSqmZnj_qG33f1qK8st3w>