Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Aug 2015 01:29:46 +0100
From:      Gary Palmer <gpalmer@freebsd.org>
To:        Wim Lewis <wiml@omnigroup.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: ZFS L2ARC statistics interpretation
Message-ID:  <20150820002946.GD13503@in-addr.com>
In-Reply-To: <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com>
References:  <0CEC2752-7787-4C6D-99E2-E7D7BF238449@omnigroup.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 19, 2015 at 04:08:47PM -0700, Wim Lewis wrote:
> I'm trying to understand some problems we've been having with our ZFS systems, in particular their L2ARC performance. Before I make too many guesses about what's going on, I'm hoping someone can clarify what some of the ZFS statistics actually mean, or point me to documentation if any exists.
> 
> In particular, I'm hoping someone can tell me the interpretation of:
> 
> Errors:
>    kstat.zfs.misc.arcstats.l2_cksum_bad
>    kstat.zfs.misc.arcstats.l2_io_error
> 
> Other than problems with the underlying disk (or controller or cable or...), are there reasons for these counters to be nonzero? On some of our systems, they increase fairly rapidly (20000/day). Is this considered normal, or does it indicate a problem? If a problem, what should I be looking at?
> 
> Size:
>    kstat.zfs.misc.arcstats.l2_size
>    kstat.zfs.misc.arcstats.l2_asize
> 
> What does l2_size/l2_asize measure? Compressed or uncompressed size? It sometimes tops out at roughly the size of my L2ARC device, and sometimes just continually grows (e.g., one of my systems has an l2_size of about 1.3T but a 190G L2ARC; I doubt I'm getting nearly 7:1 compression on my dataset! But maybe I am? How can I tell?)
> 
> There are reports over the last few years [1,2,3,4] that suggest that there's a ZFS bug that attempts to use space past the end of the L2ARC, resulting both in l2_size being larger than is possible and also in io_errors and bad cksums (when the nonexistent sectors are read back). But given that this behavior has been reported off and on for several years now, and many of the threads devolve into supposition and folklore, I'm hoping to get an informed answer about what these statistics mean, whether the numbers I'm seeing indicate a problem or not, and be able to make a judgment about whether a given fix in FreeBSD might solve the problem.
> 
> FWIW, I'm seeing these problems on FreeBSD 10.0 and 10.1; I'm not seeing them on 9.2. 
> 
> 
> [1] https://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html
> [2] https://forums.freebsd.org/threads/l2arc-degraded.47540/
> [3] https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020256.html
> [4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198242


I think the checksum/IO problems as well as the huge reported size
of your L2ARC are both a result of a problem described at the following
url

https://reviews.freebsd.org/D2764

Not sure if a fix is in 10.2 or not yet.

Regards,

Gary



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150820002946.GD13503>