Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 Feb 2016 01:26:43 -0500
From:      Charles Sprickman <spork@bway.net>
To:        Ben Woods <woodsb02@gmail.com>
Cc:        Vick Khera <vivek@khera.org>, freebsd-fs@freebsd.org, "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: NFS unstable with high load on server
Message-ID:  <5EAD4A4A-211F-451E-A3B9-752DAC6D94B4@bway.net>
In-Reply-To: <CAOc73CCHS4r-proJ_jT4T%2BBfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>
References:  <CALd%2BdcfzPU=nMGo41BBZzt3jQnsQJaANVyA222TDM_is2Ueo0A@mail.gmail.com> <CAOc73CCHS4r-proJ_jT4T%2BBfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote:
>=20
> On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote:
>=20
>> I have a handful of servers at my data center all running FreeBSD =
10.2. On
>> one of them I have a copy of the FreeBSD sources shared via NFS. When =
this
>> server is running a large poudriere run re-building all the ports I =
need,
>> the clients' NFS mounts become unstable. That is, the clients keep =
getting
>> read failures. The interactive performance of the NFS server is just =
fine,
>> however. The local file system is a ZFS mirror.
>>=20
>> What could be causing NFS to be unstable in this situation?
>>=20
>> Specifics:
>>=20
>> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with =
NFS
>> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad
>> processor.
>>=20
>> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS
>> exported via the ZFS exports file. I put the FreeBSD sources on this
>> dataset and symlink to /usr/src.
>>=20
>>=20
>> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, =
NFS
>> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor =
(basically
>> same hardware but more RAM).
>>=20
>> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS
>> options are "intr,nolockd". /usr/src is symlinked to the sources in =
that
>> NFS mount.
>>=20
>>=20
>> What I observe:
>>=20
>> [lorax]~% cd /usr/src
>> [lorax]src% svn status
>> [lorax]src% w
>> 9:12AM  up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61
>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
>> vivek      pts/0    vick.int.kcilink.com      8:44AM     - tmux: =
client
>> (/tmp/
>> vivek      pts/1    tmux(19747).%0            8:44AM    19 sed
>> y%*+%pp%;s%[^_a
>> vivek      pts/2    tmux(19747).%1            8:56AM     - w
>> vivek      pts/3    tmux(19747).%2            8:56AM     - slogin
>> bluefish-prv
>> [lorax]src% pwd
>> /u/lorax1/usr10/src
>>=20
>> So right now the load average is more than 1 per processor on lorax. =
I can
>> quite easily run "svn status" on the source directory, and the =
interactive
>> performance is pretty snappy for editing local files and navigating =
around
>> the file system.
>>=20
>>=20
>> On the client:
>>=20
>> [bluefish]~% cd /usr/src
>> [bluefish]src% pwd
>> /n/lorax1/usr10/src
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory =
'/n/lorax1/usr10/src/contrib/sqlite3':
>> Partial results are valid but processing is incomplete
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory =
'/n/lorax1/usr10/src/lib/libfetch':
>> Partial results are valid but processing is incomplete
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory
>> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results =
are
>> valid but processing is incomplete
>> [bluefish]src% w
>> 9:14AM  up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15
>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
>> vivek      pts/0    lorax-prv.kcilink.com     8:56AM     - w
>> [bluefish]src% df .
>> Filesystem          1K-blocks    Used     Avail Capacity  Mounted on
>> lorax-prv:/u/lorax1 932845181 6090910 926754271     1%    /n/lorax1
>>=20
>>=20
>> What I see is more or less random failures to read the NFS volume. =
When the
>> server is not so busy running poudriere builds, the client never has =
any
>> failures.
>>=20
>> I also observe this kind of failure doing  buildworld or installworld =
on
>> the client when the server is busy -- I get strange random failures =
reading
>> the files causing the build or install to fail.
>>=20
>> My workaround is to not do build/installs on client machines when the =
NFS
>> server is busy doing large jobs like building all packages, but there =
is
>> definitely something wrong here I'd like to fix. I observe this on =
all the
>> local NFS clients. I rebooted the server before to try to clear this =
up but
>> it did not fix it.
>>=20
>> Any help would be appreciated.
>>=20
>=20
> I just wanted to point out that I am experiencing this exact same =
issue in
> my home setup.
>=20
> Performing an installworld from an NFS mount works perfectly, until I =
start
> running poudriere on the NFS server. Then I start getting NFS timeouts =
and
> the installworld fails.
>=20
> The NFS server is also using ZFS, but the NFS export in my case is =
being
> done via the ZFS property "sharenfs" (I am not using the /etc/exports =
file).

Me three.  I=E2=80=99m actually updating a small group of servers now =
and started=20
blowing up my installworlds by trying to do some poudriere builds at the =
same=20
time.  Very repeatable.  Of note, I=E2=80=99m on 9.3, and saw this on =
8.4 as well.  If I=20
track down the client-side failures, it=E2=80=99s always =E2=80=9Cpermissi=
on denied=E2=80=9D.

Thanks,

Charles

>=20
> I suspect this will boil down to a ZFS tuning issue, where poudriere =
and
> installworld are both stress testing the server. Both of these would
> obviously cause significant memory and CPU usage, and the "recently =
used"
> portion of the ARC to be constantly flushed as they access a large =
number
> of different files.
>=20
> It might be interesting if you could report the output of the heading =
lines
> (including memory and ARC details) from the "top" command before/after
> running poudriere and attempting the installworld.
>=20
> Regards,
> Ben
>=20
>=20
> --=20
>=20
> --
> From: Benjamin Woods
> woodsb02@gmail.com
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5EAD4A4A-211F-451E-A3B9-752DAC6D94B4>