From owner-freebsd-fs@FreeBSD.ORG  Tue Aug  2 15:06:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D6E74106566B;
	Tue,  2 Aug 2011 15:06:11 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id A7C808FC0C;
	Tue,  2 Aug 2011 15:06:10 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA22003;
	Tue, 02 Aug 2011 18:06:08 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1QoGXn-000HlP-Mw; Tue, 02 Aug 2011 18:06:07 +0300
Message-ID: <4E38125E.1040109@FreeBSD.org>
Date: Tue, 02 Aug 2011 18:06:06 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:5.0) Gecko/20110706 Thunderbird/5.0
MIME-Version: 1.0
To: Doug Barton <dougb@FreeBSD.org>, freebsd-ports@FreeBSD.org
References: <4E345DBD.1090503@FreeBSD.org> <4E34B0BB.9050008@FreeBSD.org>
	<4E353A46.1050204@FreeBSD.org> <4E35A998.5060102@FreeBSD.org>
	<4E37F81F.7040902@FreeBSD.org>
In-Reply-To: <4E37F81F.7040902@FreeBSD.org>
X-Enigmail-Version: 1.2pre
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: portmaster, zfs metadata caching [Was: UPDATING 20110730]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Aug 2011 15:06:12 -0000

on 02/08/2011 16:14 Andriy Gapon said the following:
> And now to my side of the problem.
> While "profiling" pkg_info with ktrace I see getdirentries(2) calls sometimes
> take quite a while.  And since I have > 1000 ports all those calls do add up.
> DTrace shows that the calls are quite fast (~0.3 ms) when there is no actual
> disk access, but if it occurs then it introduces a delay on the orders of 1 -
> 100 milliseconds.
> I am really in doubts about what is happening here.  It seems that all the
> directory data isn't kept in ZFS ARC for long enough or is squeezed out of it by
> some other data (without additional pressure it should easily fit into the ARC).
>   And also that somehow disk accesses have quite large latency.  Although svc_t
> according to iostat is smaller (5 - 10 ms).  DTrace shows that the thread spends
> the time in cv_wait.  So it's possible that the scheduler is also involved here
> as its decisions also may add a delay to when the thread becomes runnable again.

Reporting further, just in case anyone follows this.
(You may want to scroll down to my conclusions at the end of the message).

I tracked my ZFS problem to my experiments with ZFS tuning.
I limited my ARC size at some value that I considered to be large enough to
cache my working sets of data and _metadata_.  Little did I know that by default
ZFS sets aside only 1/4th of ARC size for metadata.  So this is already
significantly smaller than I expected.  Then it seems that a large piece of that
metadata portion is permanently occupied by some non-evict-able data (not sure
what it actually is, haven't tracked yet).  In the end only a small portion of
my ARC was available for holding the metadata which included the directory
contents data.

So this is what I had with the old settings:
vfs.zfs.arc_meta_limit: 314572800
vfs.zfs.arc_meta_used: 401064272
and
$ for i in $(jot 5) ; do /usr/bin/time -p pkg_info -O print/printme ; done
The following installed package(s) has print/printme origin:
real 12.55
user 0.02
sys 2.51
The following installed package(s) has print/printme origin:
real 12.65
user 0.03
sys 1.99
The following installed package(s) has print/printme origin:
real 10.57
user 0.02
sys 1.57
The following installed package(s) has print/printme origin:
real 8.85
user 0.03
sys 0.17
The following installed package(s) has print/printme origin:
real 9.28
user 0.02
sys 0.20

I think that you should get the picture.

Now I have bumped the limit and this is what I had just right after doing it:
vfs.zfs.arc_meta_limit: 717225984
vfs.zfs.arc_meta_used: 414439800
and
$ for i in $(jot 5) ; do /usr/bin/time -p pkg_info -O print/printme ; done
The following installed package(s) has print/printme origin:
real 9.08
user 0.01
sys 0.18
The following installed package(s) has print/printme origin:
real 7.48
user 0.04
sys 0.14
The following installed package(s) has print/printme origin:
real 0.08
user 0.00
sys 0.07
The following installed package(s) has print/printme origin:
real 0.95
user 0.03
sys 0.04
The following installed package(s) has print/printme origin:
real 0.08
user 0.00
sys 0.07

Two runs to "warm up" the ARC and then everything works just perfect.

I think that this is an important discovery for two reason:
1. I learned a new thing about ZFS ARC.
2. This problem demonstrates that portmaster currently does depend on a
filesystem cache being able to hold a significant amount of ports/packages
(meta)data.

-- 
Andriy Gapon