Date: Sat, 29 Oct 2016 16:32:44 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: lev@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org> Subject: Re: ZFS L2ARC checksum errors after compression Message-ID: <3dae7691-fcd1-b3b9-445c-b81d6f0cdc52@FreeBSD.org> In-Reply-To: <921575537.20161029143626@serebryakov.spb.ru> References: <921575537.20161029143626@serebryakov.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29/10/2016 14:36, Lev Serebryakov wrote:
> Hello freebsd-fs,
>
> System is FreeBSD 10.3-STABLE #0 r307523: Mon Oct 17 22:36:27 MSK 2016.
>
> I have a small L2ARC (185G) on SSD for my RAIDZ1 pool.
>
> When "ALLOC" on this L2ARC becomes greater than "SIZE" (it is compression
> works, am I right?), zfs-stats shows, that number of checkum errors start
> to raise. For example, I have this "zfs-stats -L" output now:
>
> L2 ARC Summary: (DEGRADED)
> Passed Headroom: 153.46k
> Tried Lock Failures: 9.65k
> IO In Progress: 4.33k
> Low Memory Aborts: 9
> Free on Write: 1.77k
> Writes While Full: 15.20k
> R/W Clashes: 0
> Bad Checksums: 104.95k
> IO Errors: 0
> SPA Mismatch: 4.10m
>
>
> And "Bad Checksums" goes up rather fast, it becomes 105.31k when I compose
> this message!
>
> Looks like here is some problems with L2ARC compression.
>
I think that a recent upstream change, compressed ARC support, reintroduced an a
old problem that was fixed a while ago.
It would be great if you could test this patch:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 308050)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy)
@@ -7028,7 +7028,22 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev,
continue;
}
- if ((write_asize + HDR_GET_LSIZE(hdr)) > target_sz) {
+ /*
+ * We rely on the L1 portion of the header below, so
+ * it's invalid for this header to have been evicted out
+ * of the ghost cache, prior to being written out. The
+ * ARC_FLAG_L2_WRITING bit ensures this won't happen.
+ */
+ ASSERT(HDR_HAS_L1HDR(hdr));
+
+ ASSERT3U(HDR_GET_PSIZE(hdr), >, 0);
+ ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
+ ASSERT3U(arc_hdr_size(hdr), >, 0);
+ uint64_t size = arc_hdr_size(hdr);
+ uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev,
+ size);
+
+ if ((write_asize + asize) > target_sz) {
full = B_TRUE;
mutex_exit(hash_lock);
ARCSTAT_BUMP(arcstat_l2_write_full);
@@ -7063,21 +7078,6 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev,
list_insert_head(&dev->l2ad_buflist, hdr);
mutex_exit(&dev->l2ad_mtx);
- /*
- * We rely on the L1 portion of the header below, so
- * it's invalid for this header to have been evicted out
- * of the ghost cache, prior to being written out. The
- * ARC_FLAG_L2_WRITING bit ensures this won't happen.
- */
- ASSERT(HDR_HAS_L1HDR(hdr));
-
- ASSERT3U(HDR_GET_PSIZE(hdr), >, 0);
- ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
- ASSERT3U(arc_hdr_size(hdr), >, 0);
- uint64_t size = arc_hdr_size(hdr);
- uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev,
- size);
-
(void) refcount_add_many(&dev->l2ad_alloc, size, hdr);
/*
--
Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3dae7691-fcd1-b3b9-445c-b81d6f0cdc52>
