From owner-freebsd-stable@FreeBSD.ORG Tue Jan 4 18:52:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3842A106566C; Tue, 4 Jan 2011 18:52:08 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 88BBB8FC15; Tue, 4 Jan 2011 18:52:06 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 011256B70F0; Tue, 4 Jan 2011 19:52:05 +0100 (CET) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 20.6888] X-CRM114-CacheID: sfid-20110104_19520_4D33DCF3 X-CRM114-Status: Good ( pR: 20.6888 ) X-Spambayes-Classification: ham; 0.00 Message-ID: <4D236C54.6050309@fsn.hu> Date: Tue, 04 Jan 2011 19:52:04 +0100 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Bob Friesenhahn References: <4D0A09AF.3040005@FreeBSD.org> <4D1F7008.3050506@fsn.hu> <4D222B7B.1090902@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, Artem Belevich Subject: Re: New ZFSv28 patchset for 8-STABLE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jan 2011 18:52:08 -0000 On 01/03/2011 10:35 PM, Bob Friesenhahn wrote: >>> >> After four days, the L2 hit rate is still hovering around 10-20 >> percents (was between 60-90), so I think it's clearly a regression in >> the ZFSv28 patch... >> And the massive growth in CPU usage can also very nicely be seen... >> >> I've updated the graphs at (switch time can be checked on the zfs-mem >> graph): >> http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/ >> >> There is a new phenomenom: the large IOPS peaks. I use this munin >> script on a lot of machines and never seen anything like this... I'm >> not sure whether it's related or not. > > It is not so clear that there is a problem. I am not sure what you > are using this server for but it is wise The IO pattern has changed radically, so for me it's a problem. > to consider that this is the funny time when a new year starts, SPAM > delivery goes through the roof, and employees and customers behave > differently. You chose the worst time of the year to implement the > change and observe behavior. It's a free software mirror, ftp.fsn.hu, and I'm sure that it's (the very low hit rate and the increased CPU usage) not related to the time when I made the switch. > > CPU use is indeed increased somewhat. A lower loading of the l2arc is > not necessarily a problem. The l2arc is usually bandwidth limited > compared with main store so if bulk data can not be cached in RAM, > then it is best left in main store. A smarter l2arc algorithm could > put only the data producing the expensive IOPS (the ones requiring a > seek) in the l2arc, lessening the amount of data cached on the device. That would make sense, if I wouldn't have 100-120 IOPS (for 7k2 RPM disks, it's about their max, gstat tells me the same) on the disks, and as low as 10 percents of L2 hit rate. What's smarter? Having 60-90% hit rate from the SSDs and moving the slow disk heads less, or having 10-20 percent of hit rate and kill the disks with random IO? If you are right, ZFS tries to be too smart and falls on its face with this kind of workload. BTW, I've checked the v15-v28 patch for arc.c, and I can't see any L2ARC related change there. I'm not sure whether the hypothetical logic would be there, or a different file, I haven't read it end to end.