From owner-freebsd-questions@freebsd.org  Tue Feb  2 06:26:56 2016
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 10A4BA98B0F;
 Tue,  2 Feb 2016 06:26:56 +0000 (UTC) (envelope-from spork@bway.net)
Received: from smtp1.bway.net (smtp1.v6.bway.net [IPv6:2607:d300:1::27])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C7A511F32;
 Tue,  2 Feb 2016 06:26:55 +0000 (UTC) (envelope-from spork@bway.net)
Received: from frankentosh.sporklab.com (foon.sporktines.com [96.57.144.66])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 (Authenticated sender: spork@bway.net)
 by smtp1.bway.net (Postfix) with ESMTPSA id 5A03395854;
 Tue,  2 Feb 2016 01:26:44 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=bway.net; s=mail;
 t=1454394404; bh=Z8V/1DQgUvKQRuYpDJCHcKXtgCNlOhVLAm7n9MeIdx8=;
 h=Subject:From:In-Reply-To:Date:Cc:References:To;
 b=fZZuftmuw9BQTXaIfbebFtTv3wqS2+m7DxHWT3aSfmXp5CGUdLuK64wWJ+/OTZpF+
 nwNw60WiiGU4U/HFYhkvPjlXvqSYrpXCij3vHy9IRnqyibjQT4LM5OaPmbXU4uLFz2
 wohu0XVppzkaAcvaHaTWHS0PifLRPxBdRnYzZy0w=
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
Subject: Re: NFS unstable with high load on server
From: Charles Sprickman <spork@bway.net>
In-Reply-To: <CAOc73CCHS4r-proJ_jT4T+BfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>
Date: Tue, 2 Feb 2016 01:26:43 -0500
Cc: Vick Khera <vivek@khera.org>, freebsd-fs@freebsd.org,
 "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <5EAD4A4A-211F-451E-A3B9-752DAC6D94B4@bway.net>
References: <CALd+dcfzPU=nMGo41BBZzt3jQnsQJaANVyA222TDM_is2Ueo0A@mail.gmail.com>
 <CAOc73CCHS4r-proJ_jT4T+BfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>
To: Ben Woods <woodsb02@gmail.com>
X-Mailer: Apple Mail (2.3112)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2016 06:26:56 -0000

On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote:
>=20
> On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote:
>=20
>> I have a handful of servers at my data center all running FreeBSD =
10.2. On
>> one of them I have a copy of the FreeBSD sources shared via NFS. When =
this
>> server is running a large poudriere run re-building all the ports I =
need,
>> the clients' NFS mounts become unstable. That is, the clients keep =
getting
>> read failures. The interactive performance of the NFS server is just =
fine,
>> however. The local file system is a ZFS mirror.
>>=20
>> What could be causing NFS to be unstable in this situation?
>>=20
>> Specifics:
>>=20
>> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with =
NFS
>> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad
>> processor.
>>=20
>> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS
>> exported via the ZFS exports file. I put the FreeBSD sources on this
>> dataset and symlink to /usr/src.
>>=20
>>=20
>> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, =
NFS
>> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor =
(basically
>> same hardware but more RAM).
>>=20
>> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS
>> options are "intr,nolockd". /usr/src is symlinked to the sources in =
that
>> NFS mount.
>>=20
>>=20
>> What I observe:
>>=20
>> [lorax]~% cd /usr/src
>> [lorax]src% svn status
>> [lorax]src% w
>> 9:12AM  up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61
>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
>> vivek      pts/0    vick.int.kcilink.com      8:44AM     - tmux: =
client
>> (/tmp/
>> vivek      pts/1    tmux(19747).%0            8:44AM    19 sed
>> y%*+%pp%;s%[^_a
>> vivek      pts/2    tmux(19747).%1            8:56AM     - w
>> vivek      pts/3    tmux(19747).%2            8:56AM     - slogin
>> bluefish-prv
>> [lorax]src% pwd
>> /u/lorax1/usr10/src
>>=20
>> So right now the load average is more than 1 per processor on lorax. =
I can
>> quite easily run "svn status" on the source directory, and the =
interactive
>> performance is pretty snappy for editing local files and navigating =
around
>> the file system.
>>=20
>>=20
>> On the client:
>>=20
>> [bluefish]~% cd /usr/src
>> [bluefish]src% pwd
>> /n/lorax1/usr10/src
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory =
'/n/lorax1/usr10/src/contrib/sqlite3':
>> Partial results are valid but processing is incomplete
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory =
'/n/lorax1/usr10/src/lib/libfetch':
>> Partial results are valid but processing is incomplete
>> [bluefish]src% svn status
>> svn: E070008: Can't read directory
>> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results =
are
>> valid but processing is incomplete
>> [bluefish]src% w
>> 9:14AM  up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15
>> USER       TTY      FROM                      LOGIN@  IDLE WHAT
>> vivek      pts/0    lorax-prv.kcilink.com     8:56AM     - w
>> [bluefish]src% df .
>> Filesystem          1K-blocks    Used     Avail Capacity  Mounted on
>> lorax-prv:/u/lorax1 932845181 6090910 926754271     1%    /n/lorax1
>>=20
>>=20
>> What I see is more or less random failures to read the NFS volume. =
When the
>> server is not so busy running poudriere builds, the client never has =
any
>> failures.
>>=20
>> I also observe this kind of failure doing  buildworld or installworld =
on
>> the client when the server is busy -- I get strange random failures =
reading
>> the files causing the build or install to fail.
>>=20
>> My workaround is to not do build/installs on client machines when the =
NFS
>> server is busy doing large jobs like building all packages, but there =
is
>> definitely something wrong here I'd like to fix. I observe this on =
all the
>> local NFS clients. I rebooted the server before to try to clear this =
up but
>> it did not fix it.
>>=20
>> Any help would be appreciated.
>>=20
>=20
> I just wanted to point out that I am experiencing this exact same =
issue in
> my home setup.
>=20
> Performing an installworld from an NFS mount works perfectly, until I =
start
> running poudriere on the NFS server. Then I start getting NFS timeouts =
and
> the installworld fails.
>=20
> The NFS server is also using ZFS, but the NFS export in my case is =
being
> done via the ZFS property "sharenfs" (I am not using the /etc/exports =
file).

Me three.  I=E2=80=99m actually updating a small group of servers now =
and started=20
blowing up my installworlds by trying to do some poudriere builds at the =
same=20
time.  Very repeatable.  Of note, I=E2=80=99m on 9.3, and saw this on =
8.4 as well.  If I=20
track down the client-side failures, it=E2=80=99s always =E2=80=9Cpermissi=
on denied=E2=80=9D.

Thanks,

Charles

>=20
> I suspect this will boil down to a ZFS tuning issue, where poudriere =
and
> installworld are both stress testing the server. Both of these would
> obviously cause significant memory and CPU usage, and the "recently =
used"
> portion of the ARC to be constantly flushed as they access a large =
number
> of different files.
>=20
> It might be interesting if you could report the output of the heading =
lines
> (including memory and ARC details) from the "top" command before/after
> running poudriere and attempting the installworld.
>=20
> Regards,
> Ben
>=20
>=20
> --=20
>=20
> --
> From: Benjamin Woods
> woodsb02@gmail.com
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"