From owner-freebsd-fs@FreeBSD.ORG Tue Aug 2 15:06:11 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6E74106566B; Tue, 2 Aug 2011 15:06:11 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A7C808FC0C; Tue, 2 Aug 2011 15:06:10 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA22003; Tue, 02 Aug 2011 18:06:08 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QoGXn-000HlP-Mw; Tue, 02 Aug 2011 18:06:07 +0300 Message-ID: <4E38125E.1040109@FreeBSD.org> Date: Tue, 02 Aug 2011 18:06:06 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Doug Barton , freebsd-ports@FreeBSD.org References: <4E345DBD.1090503@FreeBSD.org> <4E34B0BB.9050008@FreeBSD.org> <4E353A46.1050204@FreeBSD.org> <4E35A998.5060102@FreeBSD.org> <4E37F81F.7040902@FreeBSD.org> In-Reply-To: <4E37F81F.7040902@FreeBSD.org> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: portmaster, zfs metadata caching [Was: UPDATING 20110730] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2011 15:06:12 -0000 on 02/08/2011 16:14 Andriy Gapon said the following: > And now to my side of the problem. > While "profiling" pkg_info with ktrace I see getdirentries(2) calls sometimes > take quite a while. And since I have > 1000 ports all those calls do add up. > DTrace shows that the calls are quite fast (~0.3 ms) when there is no actual > disk access, but if it occurs then it introduces a delay on the orders of 1 - > 100 milliseconds. > I am really in doubts about what is happening here. It seems that all the > directory data isn't kept in ZFS ARC for long enough or is squeezed out of it by > some other data (without additional pressure it should easily fit into the ARC). > And also that somehow disk accesses have quite large latency. Although svc_t > according to iostat is smaller (5 - 10 ms). DTrace shows that the thread spends > the time in cv_wait. So it's possible that the scheduler is also involved here > as its decisions also may add a delay to when the thread becomes runnable again. Reporting further, just in case anyone follows this. (You may want to scroll down to my conclusions at the end of the message). I tracked my ZFS problem to my experiments with ZFS tuning. I limited my ARC size at some value that I considered to be large enough to cache my working sets of data and _metadata_. Little did I know that by default ZFS sets aside only 1/4th of ARC size for metadata. So this is already significantly smaller than I expected. Then it seems that a large piece of that metadata portion is permanently occupied by some non-evict-able data (not sure what it actually is, haven't tracked yet). In the end only a small portion of my ARC was available for holding the metadata which included the directory contents data. So this is what I had with the old settings: vfs.zfs.arc_meta_limit: 314572800 vfs.zfs.arc_meta_used: 401064272 and $ for i in $(jot 5) ; do /usr/bin/time -p pkg_info -O print/printme ; done The following installed package(s) has print/printme origin: real 12.55 user 0.02 sys 2.51 The following installed package(s) has print/printme origin: real 12.65 user 0.03 sys 1.99 The following installed package(s) has print/printme origin: real 10.57 user 0.02 sys 1.57 The following installed package(s) has print/printme origin: real 8.85 user 0.03 sys 0.17 The following installed package(s) has print/printme origin: real 9.28 user 0.02 sys 0.20 I think that you should get the picture. Now I have bumped the limit and this is what I had just right after doing it: vfs.zfs.arc_meta_limit: 717225984 vfs.zfs.arc_meta_used: 414439800 and $ for i in $(jot 5) ; do /usr/bin/time -p pkg_info -O print/printme ; done The following installed package(s) has print/printme origin: real 9.08 user 0.01 sys 0.18 The following installed package(s) has print/printme origin: real 7.48 user 0.04 sys 0.14 The following installed package(s) has print/printme origin: real 0.08 user 0.00 sys 0.07 The following installed package(s) has print/printme origin: real 0.95 user 0.03 sys 0.04 The following installed package(s) has print/printme origin: real 0.08 user 0.00 sys 0.07 Two runs to "warm up" the ARC and then everything works just perfect. I think that this is an important discovery for two reason: 1. I learned a new thing about ZFS ARC. 2. This problem demonstrates that portmaster currently does depend on a filesystem cache being able to hold a significant amount of ports/packages (meta)data. -- Andriy Gapon