From owner-freebsd-fs@freebsd.org Wed Aug 19 23:17:38 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 310389BD949 for ; Wed, 19 Aug 2015 23:17:38 +0000 (UTC) (envelope-from wiml@omnigroup.com) Received: from omnigroup.com (omnigroup.com [198.151.161.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "omnigroup.com", Issuer "The Omni Group CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C88A145A for ; Wed, 19 Aug 2015 23:17:37 +0000 (UTC) (envelope-from wiml@omnigroup.com) Received: from machamp.omnigroup.com (machamp.omnigroup.com [198.151.161.135]) by omnigroup.com (Postfix) with ESMTP id 560C126353B1 for ; Wed, 19 Aug 2015 16:08:48 -0700 (PDT) Received: from [10.4.3.73] (pfsense.omnigroup.com [198.151.161.131]) by machamp.omnigroup.com (Postfix) with ESMTPSA id 490A512F9E0D for ; Wed, 19 Aug 2015 16:08:30 -0700 (PDT) From: Wim Lewis Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: ZFS L2ARC statistics interpretation Message-Id: <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com> Date: Wed, 19 Aug 2015 16:08:47 -0700 To: FreeBSD Filesystems Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) X-Mailer: Apple Mail (2.2102) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Aug 2015 23:17:38 -0000 I'm trying to understand some problems we've been having with our ZFS = systems, in particular their L2ARC performance. Before I make too many = guesses about what's going on, I'm hoping someone can clarify what some = of the ZFS statistics actually mean, or point me to documentation if any = exists. In particular, I'm hoping someone can tell me the interpretation of: Errors: kstat.zfs.misc.arcstats.l2_cksum_bad kstat.zfs.misc.arcstats.l2_io_error Other than problems with the underlying disk (or controller or cable = or...), are there reasons for these counters to be nonzero? On some of = our systems, they increase fairly rapidly (20000/day). Is this = considered normal, or does it indicate a problem? If a problem, what = should I be looking at? Size: kstat.zfs.misc.arcstats.l2_size kstat.zfs.misc.arcstats.l2_asize What does l2_size/l2_asize measure? Compressed or uncompressed size? It = sometimes tops out at roughly the size of my L2ARC device, and sometimes = just continually grows (e.g., one of my systems has an l2_size of about = 1.3T but a 190G L2ARC; I doubt I'm getting nearly 7:1 compression on my = dataset! But maybe I am? How can I tell?) There are reports over the last few years [1,2,3,4] that suggest that = there's a ZFS bug that attempts to use space past the end of the L2ARC, = resulting both in l2_size being larger than is possible and also in = io_errors and bad cksums (when the nonexistent sectors are read back). = But given that this behavior has been reported off and on for several = years now, and many of the threads devolve into supposition and = folklore, I'm hoping to get an informed answer about what these = statistics mean, whether the numbers I'm seeing indicate a problem or = not, and be able to make a judgment about whether a given fix in FreeBSD = might solve the problem. FWIW, I'm seeing these problems on FreeBSD 10.0 and 10.1; I'm not seeing = them on 9.2.=20 [1] = https://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.ht= ml [2] https://forums.freebsd.org/threads/l2arc-degraded.47540/ [3] = https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020256.html [4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D198242 Thanks Wim Lewis / wiml@omnigroup.com