FreeBSD Mail Archives

Date:      Tue, 2 Feb 2016 17:41:31 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        spork@bway.net, freebsd-fs@freebsd.org, vivek@khera.org,  freebsd-questions@freebsd.org
Subject:   Re: NFS unstable with high load on server
Message-ID:  <1270648257.999240.1454452891099.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <201602021848.u12ImDES067799@gw.catspoiler.org>
References:  <201602021848.u12ImDES067799@gw.catspoiler.org>

------=_Part_999238_496640285.1454452891098
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Don Lewis wrote:
> On  2 Feb, Charles Sprickman wrote:
> > On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote:
> >>=20
> >> On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote:
> >>=20
> >>> I have a handful of servers at my data center all running FreeBSD 10.=
2.
> >>> On
> >>> one of them I have a copy of the FreeBSD sources shared via NFS. When
> >>> this
> >>> server is running a large poudriere run re-building all the ports I n=
eed,
> >>> the clients' NFS mounts become unstable. That is, the clients keep
> >>> getting
> >>> read failures. The interactive performance of the NFS server is just
> >>> fine,
> >>> however. The local file system is a ZFS mirror.
> >>>=20
> >>> What could be causing NFS to be unstable in this situation?
> >>>=20
> >>> Specifics:
> >>>=20
> >>> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with =
NFS
> >>> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad
> >>> processor.
> >>>=20
> >>> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS
> >>> exported via the ZFS exports file. I put the FreeBSD sources on this
> >>> dataset and symlink to /usr/src.
> >>>=20
> >>>=20
> >>> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, NF=
S
> >>> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor
> >>> (basically
> >>> same hardware but more RAM).
> >>>=20
> >>> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS
> >>> options are "intr,nolockd". /usr/src is symlinked to the sources in t=
hat
> >>> NFS mount.
> >>>=20
> >>>=20
> >>> What I observe:
> >>>=20
> >>> [lorax]~% cd /usr/src
> >>> [lorax]src% svn status
> >>> [lorax]src% w
> >>> 9:12AM  up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61
> >>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> >>> vivek      pts/0    vick.int.kcilink.com      8:44AM     - tmux: clie=
nt
> >>> (/tmp/
> >>> vivek      pts/1    tmux(19747).%0            8:44AM    19 sed
> >>> y%*+%pp%;s%[^_a
> >>> vivek      pts/2    tmux(19747).%1            8:56AM     - w
> >>> vivek      pts/3    tmux(19747).%2            8:56AM     - slogin
> >>> bluefish-prv
> >>> [lorax]src% pwd
> >>> /u/lorax1/usr10/src
> >>>=20
> >>> So right now the load average is more than 1 per processor on lorax. =
I
> >>> can
> >>> quite easily run "svn status" on the source directory, and the
> >>> interactive
> >>> performance is pretty snappy for editing local files and navigating
> >>> around
> >>> the file system.
> >>>=20
> >>>=20
> >>> On the client:
> >>>=20
> >>> [bluefish]~% cd /usr/src
> >>> [bluefish]src% pwd
> >>> /n/lorax1/usr10/src
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/contrib/sqlit=
e3':
> >>> Partial results are valid but processing is incomplete
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory '/n/lorax1/usr10/src/lib/libfetch'=
:
> >>> Partial results are valid but processing is incomplete
> >>> [bluefish]src% svn status
> >>> svn: E070008: Can't read directory
> >>> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results a=
re
> >>> valid but processing is incomplete
> >>> [bluefish]src% w
> >>> 9:14AM  up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15
> >>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> >>> vivek      pts/0    lorax-prv.kcilink.com     8:56AM     - w
> >>> [bluefish]src% df .
> >>> Filesystem          1K-blocks    Used     Avail Capacity  Mounted on
> >>> lorax-prv:/u/lorax1 932845181 6090910 926754271     1%    /n/lorax1
> >>>=20
> >>>=20
> >>> What I see is more or less random failures to read the NFS volume. Wh=
en
> >>> the
> >>> server is not so busy running poudriere builds, the client never has =
any
> >>> failures.
> >>>=20
> >>> I also observe this kind of failure doing  buildworld or installworld=
 on
> >>> the client when the server is busy -- I get strange random failures
> >>> reading
> >>> the files causing the build or install to fail.
> >>>=20
> >>> My workaround is to not do build/installs on client machines when the=
 NFS
> >>> server is busy doing large jobs like building all packages, but there=
 is
> >>> definitely something wrong here I'd like to fix. I observe this on al=
l
> >>> the
> >>> local NFS clients. I rebooted the server before to try to clear this =
up
> >>> but
> >>> it did not fix it.
> >>>=20
> >>> Any help would be appreciated.
> >>>=20
> >>=20
> >> I just wanted to point out that I am experiencing this exact same issu=
e in
> >> my home setup.
> >>=20
> >> Performing an installworld from an NFS mount works perfectly, until I
> >> start
> >> running poudriere on the NFS server. Then I start getting NFS timeouts=
 and
> >> the installworld fails.
> >>=20
> >> The NFS server is also using ZFS, but the NFS export in my case is bei=
ng
> >> done via the ZFS property "sharenfs" (I am not using the /etc/exports
> >> file).
> >=20
> > Me three.  I=E2=80=99m actually updating a small group of servers now a=
nd started
> > blowing up my installworlds by trying to do some poudriere builds at th=
e
> > same
> > time.  Very repeatable.  Of note, I=E2=80=99m on 9.3, and saw this on 8=
.4 as well.
> > If I
> > track down the client-side failures, it=E2=80=99s always =E2=80=9Cpermi=
ssion denied=E2=80=9D.
>=20
> That sort of sounds like the problem that was fixed in HEAD with r241561
> and r241568.  It was merged to 9-STABLE before 9.3-RELEASE.  Try adding
> the -S option to mountd_flags.  I have no idea why that isn't the
> default.
>=20
It isn't the default because...
- The first time I proposed it, the consensus was that it wasn't the
  correct fix and it shouldn't go in FreeBSD.
- About 2 years later, folks agreed that it was ok as an interim solution,
  so I made it a non-default option.
  --> This voids it being considered a POLA violation.
Maybe in a couple more years it can become the default?

> When poudriere is running, it frequently mounts and unmounts
> filesystems.  When this happens, mount(8) and umount(8) notify mountd to
> update the exports list.  This is not done atomically so NFS
> transactions can fail while the mountd updates the export list.  The fix
> mentioned above pauses the nfsd threads while the export list update is
> in progress to prevent the problem.
>=20
> I don't know how this works with ZFS sharenfs, though.
>=20
I think it should be fine either way. (ZFS sharenfs is an alternate way to =
set up
ZFS exports, but I believe the result is just adding the entries to /etc/ex=
ports.)
If it doesn't work for some reason, just put lines in /etc/exports for the =
ZFS volumes
instead of using ZFS sharenfs.

I recently had a report that "-S" would get stuck for a long time before
performing an update of the exports when the server is under heavy load.
I don't think this affects many people, but the attached 2-line patch (not
yet in head) fixes the problem for the guy that reported it.

rick

> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20

------=_Part_999238_496640285.1454452891098
Content-Type: text/x-patch; name=nfssuspend.patch
Content-Disposition: attachment; filename=nfssuspend.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mc3NlcnZlci9uZnNfbmZzZGtycGMuYy5zYXYyCTIwMTYtMDEtMTUgMTg6NDI6MTUu
NDc5NzgzMDAwIC0wNTAwCisrKyBmcy9uZnNzZXJ2ZXIvbmZzX25mc2RrcnBjLmMJMjAxNi0wMS0x
NSAxODo0NTo1OS40MTgyNDUwMDAgLTA1MDAKQEAgLTIzMSwxMCArMjMxLDE2IEBAIG5mc3N2Y19w
cm9ncmFtKHN0cnVjdCBzdmNfcmVxICpycXN0LCBTVkMKIAkJICogR2V0IGEgcmVmY250IChzaGFy
ZWQgbG9jaykgb24gbmZzZF9zdXNwZW5kX2xvY2suCiAJCSAqIE5GU1NWQ19TVVNQRU5ETkZTRCB3
aWxsIHRha2UgYW4gZXhjbHVzaXZlIGxvY2sgb24KIAkJICogbmZzZF9zdXNwZW5kX2xvY2sgdG8g
c3VzcGVuZCB0aGVzZSB0aHJlYWRzLgorCQkgKiBUaGUgY2FsbCB0byBuZnN2NF9sb2NrKCkgdGhh
dCBwcmVjZWVkcyBuZnN2NF9nZXRyZWYoKQorCQkgKiBlbnN1cmVzIHRoYXQgdGhlIGFjcXVpc2l0
aW9uIG9mIHRoZSBleGNsdXNpdmUgbG9jaworCQkgKiB0YWtlcyBwcmlvcml0eSBvdmVyIGFjcXVp
c2l0aW9uIG9mIHRoZSBzaGFyZWQgbG9jayBieQorCQkgKiB3YWl0aW5nIGZvciBhbnkgZXhjbHVz
aXZlIGxvY2sgcmVxdWVzdCB0byBjb21wbGV0ZS4KIAkJICogVGhpcyBtdXN0IGJlIGRvbmUgaGVy
ZSwgYmVmb3JlIHRoZSBjaGVjayBvZgogCQkgKiBuZnN2NHJvb3QgZXhwb3J0cyBieSBuZnN2bm9f
djRyb290ZXhwb3J0KCkuCiAJCSAqLwogCQlORlNMT0NLVjRST09UTVVURVgoKTsKKwkJbmZzdjRf
bG9jaygmbmZzZF9zdXNwZW5kX2xvY2ssIDAsIE5VTEwsIE5GU1Y0Uk9PVExPQ0tNVVRFWFBUUiwK
KwkJICAgIE5VTEwpOwogCQluZnN2NF9nZXRyZWYoJm5mc2Rfc3VzcGVuZF9sb2NrLCBOVUxMLCBO
RlNWNFJPT1RMT0NLTVVURVhQVFIsCiAJCSAgICBOVUxMKTsKIAkJTkZTVU5MT0NLVjRST09UTVVU
RVgoKTsK
------=_Part_999238_496640285.1454452891098--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1270648257.999240.1454452891099.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation