From owner-freebsd-fs@freebsd.org Thu Aug 20 00:29:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E1B69BE54D for ; Thu, 20 Aug 2015 00:29:51 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D354BF7A for ; Thu, 20 Aug 2015 00:29:50 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1ZSDjm-000MLT-Ko; Thu, 20 Aug 2015 01:29:46 +0100 Date: Thu, 20 Aug 2015 01:29:46 +0100 From: Gary Palmer To: Wim Lewis Cc: FreeBSD Filesystems Subject: Re: ZFS L2ARC statistics interpretation Message-ID: <20150820002946.GD13503@in-addr.com> References: <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Aug 2015 00:29:51 -0000 On Wed, Aug 19, 2015 at 04:08:47PM -0700, Wim Lewis wrote: > I'm trying to understand some problems we've been having with our ZFS systems, in particular their L2ARC performance. Before I make too many guesses about what's going on, I'm hoping someone can clarify what some of the ZFS statistics actually mean, or point me to documentation if any exists. > > In particular, I'm hoping someone can tell me the interpretation of: > > Errors: > kstat.zfs.misc.arcstats.l2_cksum_bad > kstat.zfs.misc.arcstats.l2_io_error > > Other than problems with the underlying disk (or controller or cable or...), are there reasons for these counters to be nonzero? On some of our systems, they increase fairly rapidly (20000/day). Is this considered normal, or does it indicate a problem? If a problem, what should I be looking at? > > Size: > kstat.zfs.misc.arcstats.l2_size > kstat.zfs.misc.arcstats.l2_asize > > What does l2_size/l2_asize measure? Compressed or uncompressed size? It sometimes tops out at roughly the size of my L2ARC device, and sometimes just continually grows (e.g., one of my systems has an l2_size of about 1.3T but a 190G L2ARC; I doubt I'm getting nearly 7:1 compression on my dataset! But maybe I am? How can I tell?) > > There are reports over the last few years [1,2,3,4] that suggest that there's a ZFS bug that attempts to use space past the end of the L2ARC, resulting both in l2_size being larger than is possible and also in io_errors and bad cksums (when the nonexistent sectors are read back). But given that this behavior has been reported off and on for several years now, and many of the threads devolve into supposition and folklore, I'm hoping to get an informed answer about what these statistics mean, whether the numbers I'm seeing indicate a problem or not, and be able to make a judgment about whether a given fix in FreeBSD might solve the problem. > > FWIW, I'm seeing these problems on FreeBSD 10.0 and 10.1; I'm not seeing them on 9.2. > > > [1] https://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html > [2] https://forums.freebsd.org/threads/l2arc-degraded.47540/ > [3] https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020256.html > [4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198242 I think the checksum/IO problems as well as the huge reported size of your L2ARC are both a result of a problem described at the following url https://reviews.freebsd.org/D2764 Not sure if a fix is in 10.2 or not yet. Regards, Gary