Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Oct 2016 16:32:44 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        lev@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org>
Subject:   Re: ZFS L2ARC checksum errors after compression
Message-ID:  <3dae7691-fcd1-b3b9-445c-b81d6f0cdc52@FreeBSD.org>
In-Reply-To: <921575537.20161029143626@serebryakov.spb.ru>
References:  <921575537.20161029143626@serebryakov.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29/10/2016 14:36, Lev Serebryakov wrote:
> Hello freebsd-fs,
> 
>  System is FreeBSD 10.3-STABLE #0 r307523: Mon Oct 17 22:36:27 MSK 2016.
> 
>  I have a small L2ARC (185G) on SSD for my RAIDZ1 pool.
> 
>  When "ALLOC" on this L2ARC becomes greater than "SIZE" (it is compression
>  works, am I right?), zfs-stats shows, that number of checkum errors start
>  to raise. For example, I have this "zfs-stats -L" output now:
> 
>  L2 ARC Summary: (DEGRADED)
>         Passed Headroom:                        153.46k
>         Tried Lock Failures:                    9.65k
>         IO In Progress:                         4.33k
>         Low Memory Aborts:                      9
>         Free on Write:                          1.77k
>         Writes While Full:                      15.20k
>         R/W Clashes:                            0
>         Bad Checksums:                          104.95k
>         IO Errors:                              0
>         SPA Mismatch:                           4.10m
> 
> 
>  And "Bad Checksums" goes up rather fast, it becomes 105.31k when I compose
>  this message!
> 
>   Looks like here is some problems with L2ARC compression.
> 

I think that a recent upstream change, compressed ARC support, reintroduced an a
old problem that was fixed a while ago.

It would be great if you could test this patch:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	(revision 308050)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	(working copy)
@@ -7028,7 +7028,22 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev,
 				continue;
 			}

-			if ((write_asize + HDR_GET_LSIZE(hdr)) > target_sz) {
+			/*
+			 * We rely on the L1 portion of the header below, so
+			 * it's invalid for this header to have been evicted out
+			 * of the ghost cache, prior to being written out. The
+			 * ARC_FLAG_L2_WRITING bit ensures this won't happen.
+			 */
+			ASSERT(HDR_HAS_L1HDR(hdr));
+
+			ASSERT3U(HDR_GET_PSIZE(hdr), >, 0);
+			ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
+			ASSERT3U(arc_hdr_size(hdr), >, 0);
+			uint64_t size = arc_hdr_size(hdr);
+			uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev,
+			    size);
+
+			if ((write_asize + asize) > target_sz) {
 				full = B_TRUE;
 				mutex_exit(hash_lock);
 				ARCSTAT_BUMP(arcstat_l2_write_full);
@@ -7063,21 +7078,6 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev,
 			list_insert_head(&dev->l2ad_buflist, hdr);
 			mutex_exit(&dev->l2ad_mtx);

-			/*
-			 * We rely on the L1 portion of the header below, so
-			 * it's invalid for this header to have been evicted out
-			 * of the ghost cache, prior to being written out. The
-			 * ARC_FLAG_L2_WRITING bit ensures this won't happen.
-			 */
-			ASSERT(HDR_HAS_L1HDR(hdr));
-
-			ASSERT3U(HDR_GET_PSIZE(hdr), >, 0);
-			ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL);
-			ASSERT3U(arc_hdr_size(hdr), >, 0);
-			uint64_t size = arc_hdr_size(hdr);
-			uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev,
-			    size);
-
 			(void) refcount_add_many(&dev->l2ad_alloc, size, hdr);

 			/*

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3dae7691-fcd1-b3b9-445c-b81d6f0cdc52>