Date: Tue, 2 Feb 2016 17:41:31 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Don Lewis <truckman@FreeBSD.org> Cc: spork@bway.net, freebsd-fs@freebsd.org, vivek@khera.org, freebsd-questions@freebsd.org Subject: Re: NFS unstable with high load on server Message-ID: <1270648257.999240.1454452891099.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <201602021848.u12ImDES067799@gw.catspoiler.org> References: <201602021848.u12ImDES067799@gw.catspoiler.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Don Lewis wrote: > On 2 Feb, Charles Sprickman wrote: > > On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote: > >> > >> On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote: > >> > >>> I have a handful of servers at my data center all running FreeBSD 10.2. > >>> On > >>> one of them I have a copy of the FreeBSD sources shared via NFS. When > >>> this > >>> server is running a large poudriere run re-building all the ports I need, > >>> the clients' NFS mounts become unstable. That is, the clients keep > >>> getting > >>> read failures. The interactive performance of the NFS server is just > >>> fine, > >>> however. The local file system is a ZFS mirror. > >>> > >>> What could be causing NFS to be unstable in this situation? > >>> > >>> Specifics: > >>> > >>> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with NFS > >>> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad > >>> processor. > >>> > >>> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS > >>> exported via the ZFS exports file. I put the FreeBSD sources on this > >>> dataset and symlink to /usr/src. > >>> > >>> > >>> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, NFS > >>> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor > >>> (basically > >>> same hardware but more RAM). > >>> > >>> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS > >>> options are "intr,nolockd". /usr/src is symlinked to the sources in that > >>> NFS mount. > >>> > >>> > >>> What I observe: > >>> > >>> [lorax]~% cd /usr/src > >>> [lorax]src% svn status > >>> [lorax]src% w > >>> 9:12AM up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61 > >>> USER TTY FROM LOGIN@ IDLE WHAT > >>> vivek pts/0 vick.int.kcilink.com 8:44AM - tmux: client > >>> (/tmp/ > >>> vivek pts/1 tmux(19747).%0 8:44AM 19 sed > >>> y%*+%pp%;s%[^_a > >>> vivek pts/2 tmux(19747).%1 8:56AM - w > >>> vivek pts/3 tmux(19747).%2 8:56AM - slogin > >>> bluefish-prv > >>> [lorax]src% pwd > >>> /u/lorax1/usr10/src > >>> > >>> So right now the load average is more than 1 per processor on lorax. I > >>> can > >>> quite easily run "svn status" on the source directory, and the > >>> interactive > >>> performance is pretty snappy for editing local files and navigating > >>> around > >>> the file system. > >>> > >>> > >>> On the client: > >>> > >>> [bluefish]~% cd /usr/src > >>> [bluefish]src% pwd > >>> /n/lorax1/usr10/src > >>> [bluefish]src% svn status > >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/contrib/sqlite3': > >>> Partial results are valid but processing is incomplete > >>> [bluefish]src% svn status > >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/lib/libfetch': > >>> Partial results are valid but processing is incomplete > >>> [bluefish]src% svn status > >>> svn: E070008: Can't read directory > >>> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results are > >>> valid but processing is incomplete > >>> [bluefish]src% w > >>> 9:14AM up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15 > >>> USER TTY FROM LOGIN@ IDLE WHAT > >>> vivek pts/0 lorax-prv.kcilink.com 8:56AM - w > >>> [bluefish]src% df . > >>> Filesystem 1K-blocks Used Avail Capacity Mounted on > >>> lorax-prv:/u/lorax1 932845181 6090910 926754271 1% /n/lorax1 > >>> > >>> > >>> What I see is more or less random failures to read the NFS volume. When > >>> the > >>> server is not so busy running poudriere builds, the client never has any > >>> failures. > >>> > >>> I also observe this kind of failure doing buildworld or installworld on > >>> the client when the server is busy -- I get strange random failures > >>> reading > >>> the files causing the build or install to fail. > >>> > >>> My workaround is to not do build/installs on client machines when the NFS > >>> server is busy doing large jobs like building all packages, but there is > >>> definitely something wrong here I'd like to fix. I observe this on all > >>> the > >>> local NFS clients. I rebooted the server before to try to clear this up > >>> but > >>> it did not fix it. > >>> > >>> Any help would be appreciated. > >>> > >> > >> I just wanted to point out that I am experiencing this exact same issue in > >> my home setup. > >> > >> Performing an installworld from an NFS mount works perfectly, until I > >> start > >> running poudriere on the NFS server. Then I start getting NFS timeouts and > >> the installworld fails. > >> > >> The NFS server is also using ZFS, but the NFS export in my case is being > >> done via the ZFS property "sharenfs" (I am not using the /etc/exports > >> file). > > > > Me three. I’m actually updating a small group of servers now and started > > blowing up my installworlds by trying to do some poudriere builds at the > > same > > time. Very repeatable. Of note, I’m on 9.3, and saw this on 8.4 as well. > > If I > > track down the client-side failures, it’s always “permission denied”. > > That sort of sounds like the problem that was fixed in HEAD with r241561 > and r241568. It was merged to 9-STABLE before 9.3-RELEASE. Try adding > the -S option to mountd_flags. I have no idea why that isn't the > default. > It isn't the default because... - The first time I proposed it, the consensus was that it wasn't the correct fix and it shouldn't go in FreeBSD. - About 2 years later, folks agreed that it was ok as an interim solution, so I made it a non-default option. --> This voids it being considered a POLA violation. Maybe in a couple more years it can become the default? > When poudriere is running, it frequently mounts and unmounts > filesystems. When this happens, mount(8) and umount(8) notify mountd to > update the exports list. This is not done atomically so NFS > transactions can fail while the mountd updates the export list. The fix > mentioned above pauses the nfsd threads while the export list update is > in progress to prevent the problem. > > I don't know how this works with ZFS sharenfs, though. > I think it should be fine either way. (ZFS sharenfs is an alternate way to set up ZFS exports, but I believe the result is just adding the entries to /etc/exports.) If it doesn't work for some reason, just put lines in /etc/exports for the ZFS volumes instead of using ZFS sharenfs. I recently had a report that "-S" would get stuck for a long time before performing an update of the exports when the server is under heavy load. I don't think this affects many people, but the attached 2-line patch (not yet in head) fixes the problem for the guy that reported it. rick > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > [-- Attachment #2 --] --- fs/nfsserver/nfs_nfsdkrpc.c.sav2 2016-01-15 18:42:15.479783000 -0500 +++ fs/nfsserver/nfs_nfsdkrpc.c 2016-01-15 18:45:59.418245000 -0500 @@ -231,10 +231,16 @@ nfssvc_program(struct svc_req *rqst, SVC * Get a refcnt (shared lock) on nfsd_suspend_lock. * NFSSVC_SUSPENDNFSD will take an exclusive lock on * nfsd_suspend_lock to suspend these threads. + * The call to nfsv4_lock() that preceeds nfsv4_getref() + * ensures that the acquisition of the exclusive lock + * takes priority over acquisition of the shared lock by + * waiting for any exclusive lock request to complete. * This must be done here, before the check of * nfsv4root exports by nfsvno_v4rootexport(). */ NFSLOCKV4ROOTMUTEX(); + nfsv4_lock(&nfsd_suspend_lock, 0, NULL, NFSV4ROOTLOCKMUTEXPTR, + NULL); nfsv4_getref(&nfsd_suspend_lock, NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); NFSUNLOCKV4ROOTMUTEX();help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1270648257.999240.1454452891099.JavaMail.zimbra>
