From owner-freebsd-stable@FreeBSD.ORG Sun Jan 24 13:22:40 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9946B1065696 for ; Sun, 24 Jan 2010 13:22:40 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id E0CA58FC13 for ; Sun, 24 Jan 2010 13:22:39 +0000 (UTC) Received: from elsa.codelab.cz (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id BA8B419E023; Sun, 24 Jan 2010 14:22:37 +0100 (CET) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 2091219E019; Sun, 24 Jan 2010 14:22:35 +0100 (CET) Message-ID: <4B5C499A.1000103@quip.cz> Date: Sun, 24 Jan 2010 14:22:34 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.7) Gecko/20100104 SeaMonkey/2.0.2 MIME-Version: 1.0 To: Garrett Moore References: <7346c5c61001030842r7dc76199y51e4c1c90a3eea6e@mail.gmail.com> <7346c5c61001091706m45a3a2a5k3ca8bb0c4bec5ea8@mail.gmail.com> <7346c5c61001171521w1ca4738w98e8fcca24643cda@mail.gmail.com> <201001180829.48126.npapke@acm.org> <7346c5c61001190840k31466754i32b2ae833390b79b@mail.gmail.com> <4B5C2FC1.9070001@quip.cz> In-Reply-To: <4B5C2FC1.9070001@quip.cz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Leidinger , freebsd-stable@freebsd.org Subject: Re: ZFS performance degradation over time X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jan 2010 13:22:40 -0000 Miroslav Lachman wrote: [...] > Last night I tried ZFS with pool on iSCSI connected Dell MD3000i and I > was suprised by too low speed of simple cp -a command (copying from UFS > partition to ZFS) The write speed was about 2MB/s only. > > After looking in to ARC stuff, I realized some weird values: > > ARC Size: > Current Size: 1 MB (arcsize) > Target Size (Adaptive): 205 MB (c) > Min Size (Hard Limit): 205 MB (zfs_arc_min) > Max Size (Hard Limit): 1647 MB (zfs_arc_max) > > (stats from script http://cuddletech.com/arc_summary/ > freebsd version > http://bitbucket.org/koie/arc_summary/changeset/dbe14d2cf52b/ ) > > I don't know why it shows Current Size 1MB. [...] > Today I tried serving the data by Lighttpd. > There is impressive iSCSI read performance - because of ZFS prefetch, it > can achieve 880Mbits of read from iSCSI, but serving by Lighttpd only > about 66Mbits > > bce0 - internet > bce1 - iSCSI to storage MD3000i > > bce0 bce1 > Kbps in Kbps out Kbps in Kbps out > 2423.22 65481.56 855970.7 4348.73 > 2355.26 63911.74 820561.3 4846.08 > 2424.87 65998.62 848937.1 4312.37 > 2442.78 66544.95 858019.0 4356.64 [...] > ARC Size: > Current Size: 22 MB (arcsize) > Target Size (Adaptive): 205 MB (c) > Min Size (Hard Limit): 205 MB (zfs_arc_min) > Max Size (Hard Limit): 1647 MB (zfs_arc_max) > > ARC Size Breakdown: > Most Recently Used Cache Size: 5% 11 MB (p) > Most Frequently Used Cache Size: 94% 194 MB (c-p) [...] > Can somebody tell me, why ARC Current Size is dropping too low? (1-20MB > if arc_min is 205MB) > > The system have 8GB of memory and 8 CPU cores: > > last pid: 83605; load averages: 0.17, 0.15, 0.10 up 36+10:34:34 12:29:05 > 58 processes: 1 running, 56 sleeping, 1 zombie > CPU: 0.1% user, 0.0% nice, 2.3% system, 1.7% interrupt, 95.8% idle > Mem: 237M Active, 6259M Inact, 1154M Wired, 138M Cache, 827M Buf, 117M Free > Swap: 8192M Total, 96K Used, 8192M Free Hmmm, it seems related to ZFS + Sendfile bug as was pointed in older thread: Performance issues with 8.0 ZFS and sendfile/lighttpd http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052595.html http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052629.html I tried the test with Lighttpd again, but with "writev" instead of sendfile in lighttpd.conf (server.network-backend = "writev") and with this settings it triple the performance! Now Lighttpd is serving about 180Mbits instead of 66Mbits and ARC Current Size is constantly on its max size: # ~/bin/arc_summary.pl System Memory: Physical RAM: 8169 MB Free Memory : 0 MB ARC Size: Current Size: 1647 MB (arcsize) Target Size (Adaptive): 1647 MB (c) Min Size (Hard Limit): 205 MB (zfs_arc_min) Max Size (Hard Limit): 1647 MB (zfs_arc_max) ARC Size Breakdown: Most Recently Used Cache Size: 99% 1643 MB (p) Most Frequently Used Cache Size: 0% 3 MB (c-p) ARC Efficency: Cache Access Total: 126994437 Cache Hit Ratio: 94% 119500977 [Defined State for buffer] Cache Miss Ratio: 5% 7493460 [Undefined State for Buffer] REAL Hit Ratio: 93% 118808103 [MRU/MFU Hits Only] Data Demand Efficiency: 97% Data Prefetch Efficiency: 14% CACHE HITS BY CACHE LIST: Anon: --% Counter Rolled. Most Recently Used: 2% 3552568 (mru) [ Return Customer ] Most Frequently Used: 96% 115255535 (mfu) [ Frequent Customer ] Most Recently Used Ghost: 1% 1277990 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 0% 464787 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] CACHE HITS BY DATA TYPE: Demand Data: 96% 114958883 Prefetch Data: 0% 713418 Demand Metadata: 3% 3828650 Prefetch Metadata: 0% 26 CACHE MISSES BY DATA TYPE: Demand Data: 40% 3017229 Prefetch Data: 57% 4324961 Demand Metadata: 2% 151246 Prefetch Metadata: 0% 24 # ~/bin/arcstat.pl -f Time,read,hits,Hit%,miss,miss%,dmis,dm%,mmis,mm%,arcsz,c 30 Time read hits Hit% miss miss% dmis dm% mmis mm% arcsz c 14:04:45 5K 4K 87 672 12 53 1 2 0 1727635056 1727221760 14:05:16 5K 4K 86 679 13 48 1 1 0 1727283200 1727221760 14:05:46 5K 5K 88 674 11 55 1 1 0 1727423184 1727221760 14:06:17 5K 4K 87 668 12 51 1 0 0 1727590560 1727221760 14:06:47 5K 5K 88 665 11 56 1 1 0 1727278896 1727221760 14:07:18 5K 5K 88 664 11 53 1 1 0 1727347632 1727221760 # # ifstat -i bce0,bce1 -b 10 bce0 bce1 Kbps in Kbps out Kbps in Kbps out 6673.90 184872.8 679110.0 3768.23 6688.00 185420.0 655232.8 3834.10 7737.68 214640.4 673375.7 3735.96 6993.61 193602.6 671239.3 3737.48 7198.54 198665.0 688677.0 4037.28 8062.61 222400.4 683966.9 3790.40 There is also big change in memory usage: last pid: 92536; load averages: 0.19, 0.16, 0.16 up 36+12:22:26 14:16:57 60 processes: 1 running, 58 sleeping, 1 zombie CPU: 0.0% user, 0.0% nice, 2.5% system, 3.3% interrupt, 94.1% idle Mem: 1081M Active, 172M Inact, 2800M Wired, 3844M Cache, 827M Buf, 8776K Free Swap: 8192M Total, 104K Used, 8192M Free More Active, less Inact (172M instead of 6259M!) more Cache (3844M instead of 138M) Buf and Free are close in both cases. Do somebody know about some fixes of ZFS + Sendfile problem commited in to 8.x or HEAD? How can we test if it is general problem with sendfile or local problem with Lighttpd? Miroslav Lachman