FreeBSD Mail Archives

Date:      Tue, 2 Feb 2016 17:41:31 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        spork@bway.net, freebsd-fs@freebsd.org, vivek@khera.org,  freebsd-questions@freebsd.org
Subject:   Re: NFS unstable with high load on server
Message-ID:  <1270648257.999240.1454452891099.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <201602021848.u12ImDES067799@gw.catspoiler.org>
References:  <201602021848.u12ImDES067799@gw.catspoiler.org>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
Don Lewis wrote:
> On  2 Feb, Charles Sprickman wrote:
> > On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote:
> >> 
> >> On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote:
> >> 
> >>> I have a handful of servers at my data center all running FreeBSD 10.2.
> >>> On
> >>> one of them I have a copy of the FreeBSD sources shared via NFS. When
> >>> this
> >>> server is running a large poudriere run re-building all the ports I need,
> >>> the clients' NFS mounts become unstable. That is, the clients keep
> >>> getting
> >>> read failures. The interactive performance of the NFS server is just
> >>> fine,
> >>> however. The local file system is a ZFS mirror.
> >>> 
> >>> What could be causing NFS to be unstable in this situation?
> >>> 
> >>> Specifics:
> >>> 
> >>> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with NFS
> >>> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad
> >>> processor.
> >>> 
> >>> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS
> >>> exported via the ZFS exports file. I put the FreeBSD sources on this
> >>> dataset and symlink to /usr/src.
> >>> 
> >>> 
> >>> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, NFS
> >>> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor
> >>> (basically
> >>> same hardware but more RAM).
> >>> 
> >>> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS
> >>> options are "intr,nolockd". /usr/src is symlinked to the sources in that
> >>> NFS mount.
> >>> 
> >>> 
> >>> What I observe:
> >>> 
> >>> [lorax]~% cd /usr/src
> >>> [lorax]src% svn status
> >>> [lorax]src% w
> >>> 9:12AM  up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61
> >>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> >>> vivek      pts/0    vick.int.kcilink.com      8:44AM     - tmux: client
> >>> (/tmp/
> >>> vivek      pts/1    tmux(19747).%0            8:44AM    19 sed
> >>> y%*+%pp%;s%[^_a
> >>> vivek      pts/2    tmux(19747).%1            8:56AM     - w
> >>> vivek      pts/3    tmux(19747).%2            8:56AM     - slogin
> >>> bluefish-prv
> >>> [lorax]src% pwd
> >>> /u/lorax1/usr10/src
> >>> 
> >>> So right now the load average is more than 1 per processor on lorax. I
> >>> can
> >>> quite easily run "svn status" on the source directory, and the
> >>> interactive
> >>> performance is pretty snappy for editing local files and navigating
> >>> around
> >>> the file system.
> >>> 
> >>> 
> >>> On the client:
> >>> 
> >>> [bluefish]~% cd /usr/src
> >>> [bluefish]src% pwd
> >>> /n/lorax1/usr10/src
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/contrib/sqlite3':
> >>> Partial results are valid but processing is incomplete
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/lib/libfetch':
> >>> Partial results are valid but processing is incomplete
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory
> >>> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results are
> >>> valid but processing is incomplete
> >>> [bluefish]src% w
> >>> 9:14AM  up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15
> >>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> >>> vivek      pts/0    lorax-prv.kcilink.com     8:56AM     - w
> >>> [bluefish]src% df .
> >>> Filesystem          1K-blocks    Used     Avail Capacity  Mounted on
> >>> lorax-prv:/u/lorax1 932845181 6090910 926754271     1%    /n/lorax1
> >>> 
> >>> 
> >>> What I see is more or less random failures to read the NFS volume. When
> >>> the
> >>> server is not so busy running poudriere builds, the client never has any
> >>> failures.
> >>> 
> >>> I also observe this kind of failure doing  buildworld or installworld on
> >>> the client when the server is busy -- I get strange random failures
> >>> reading
> >>> the files causing the build or install to fail.
> >>> 
> >>> My workaround is to not do build/installs on client machines when the NFS
> >>> server is busy doing large jobs like building all packages, but there is
> >>> definitely something wrong here I'd like to fix. I observe this on all
> >>> the
> >>> local NFS clients. I rebooted the server before to try to clear this up
> >>> but
> >>> it did not fix it.
> >>> 
> >>> Any help would be appreciated.
> >>> 
> >> 
> >> I just wanted to point out that I am experiencing this exact same issue in
> >> my home setup.
> >> 
> >> Performing an installworld from an NFS mount works perfectly, until I
> >> start
> >> running poudriere on the NFS server. Then I start getting NFS timeouts and
> >> the installworld fails.
> >> 
> >> The NFS server is also using ZFS, but the NFS export in my case is being
> >> done via the ZFS property "sharenfs" (I am not using the /etc/exports
> >> file).
> > 
> > Me three.  I’m actually updating a small group of servers now and started
> > blowing up my installworlds by trying to do some poudriere builds at the
> > same
> > time.  Very repeatable.  Of note, I’m on 9.3, and saw this on 8.4 as well.
> > If I
> > track down the client-side failures, it’s always “permission denied”.
> 
> That sort of sounds like the problem that was fixed in HEAD with r241561
> and r241568.  It was merged to 9-STABLE before 9.3-RELEASE.  Try adding
> the -S option to mountd_flags.  I have no idea why that isn't the
> default.
> 
It isn't the default because...
- The first time I proposed it, the consensus was that it wasn't the
  correct fix and it shouldn't go in FreeBSD.
- About 2 years later, folks agreed that it was ok as an interim solution,
  so I made it a non-default option.
  --> This voids it being considered a POLA violation.
Maybe in a couple more years it can become the default?

> When poudriere is running, it frequently mounts and unmounts
> filesystems.  When this happens, mount(8) and umount(8) notify mountd to
> update the exports list.  This is not done atomically so NFS
> transactions can fail while the mountd updates the export list.  The fix
> mentioned above pauses the nfsd threads while the export list update is
> in progress to prevent the problem.
> 
> I don't know how this works with ZFS sharenfs, though.
> 
I think it should be fine either way. (ZFS sharenfs is an alternate way to set up
ZFS exports, but I believe the result is just adding the entries to /etc/exports.)
If it doesn't work for some reason, just put lines in /etc/exports for the ZFS volumes
instead of using ZFS sharenfs.

I recently had a report that "-S" would get stuck for a long time before
performing an update of the exports when the server is under heavy load.
I don't think this affects many people, but the attached 2-line patch (not
yet in head) fixes the problem for the guy that reported it.

rick

> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 

[-- Attachment #2 --]
--- fs/nfsserver/nfs_nfsdkrpc.c.sav2	2016-01-15 18:42:15.479783000 -0500
+++ fs/nfsserver/nfs_nfsdkrpc.c	2016-01-15 18:45:59.418245000 -0500
@@ -231,10 +231,16 @@ nfssvc_program(struct svc_req *rqst, SVC
 		 * Get a refcnt (shared lock) on nfsd_suspend_lock.
 		 * NFSSVC_SUSPENDNFSD will take an exclusive lock on
 		 * nfsd_suspend_lock to suspend these threads.
+		 * The call to nfsv4_lock() that preceeds nfsv4_getref()
+		 * ensures that the acquisition of the exclusive lock
+		 * takes priority over acquisition of the shared lock by
+		 * waiting for any exclusive lock request to complete.
 		 * This must be done here, before the check of
 		 * nfsv4root exports by nfsvno_v4rootexport().
 		 */
 		NFSLOCKV4ROOTMUTEX();
+		nfsv4_lock(&nfsd_suspend_lock, 0, NULL, NFSV4ROOTLOCKMUTEXPTR,
+		    NULL);
 		nfsv4_getref(&nfsd_suspend_lock, NULL, NFSV4ROOTLOCKMUTEXPTR,
 		    NULL);
 		NFSUNLOCKV4ROOTMUTEX();

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1270648257.999240.1454452891099.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation