Date: Sat, 29 Oct 2016 16:32:44 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: lev@FreeBSD.org, freebsd-fs <freebsd-fs@FreeBSD.org> Subject: Re: ZFS L2ARC checksum errors after compression Message-ID: <3dae7691-fcd1-b3b9-445c-b81d6f0cdc52@FreeBSD.org> In-Reply-To: <921575537.20161029143626@serebryakov.spb.ru> References: <921575537.20161029143626@serebryakov.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29/10/2016 14:36, Lev Serebryakov wrote: > Hello freebsd-fs, > > System is FreeBSD 10.3-STABLE #0 r307523: Mon Oct 17 22:36:27 MSK 2016. > > I have a small L2ARC (185G) on SSD for my RAIDZ1 pool. > > When "ALLOC" on this L2ARC becomes greater than "SIZE" (it is compression > works, am I right?), zfs-stats shows, that number of checkum errors start > to raise. For example, I have this "zfs-stats -L" output now: > > L2 ARC Summary: (DEGRADED) > Passed Headroom: 153.46k > Tried Lock Failures: 9.65k > IO In Progress: 4.33k > Low Memory Aborts: 9 > Free on Write: 1.77k > Writes While Full: 15.20k > R/W Clashes: 0 > Bad Checksums: 104.95k > IO Errors: 0 > SPA Mismatch: 4.10m > > > And "Bad Checksums" goes up rather fast, it becomes 105.31k when I compose > this message! > > Looks like here is some problems with L2ARC compression. > I think that a recent upstream change, compressed ARC support, reintroduced an a old problem that was fixed a while ago. It would be great if you could test this patch: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 308050) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy) @@ -7028,7 +7028,22 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, continue; } - if ((write_asize + HDR_GET_LSIZE(hdr)) > target_sz) { + /* + * We rely on the L1 portion of the header below, so + * it's invalid for this header to have been evicted out + * of the ghost cache, prior to being written out. The + * ARC_FLAG_L2_WRITING bit ensures this won't happen. + */ + ASSERT(HDR_HAS_L1HDR(hdr)); + + ASSERT3U(HDR_GET_PSIZE(hdr), >, 0); + ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL); + ASSERT3U(arc_hdr_size(hdr), >, 0); + uint64_t size = arc_hdr_size(hdr); + uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev, + size); + + if ((write_asize + asize) > target_sz) { full = B_TRUE; mutex_exit(hash_lock); ARCSTAT_BUMP(arcstat_l2_write_full); @@ -7063,21 +7078,6 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, list_insert_head(&dev->l2ad_buflist, hdr); mutex_exit(&dev->l2ad_mtx); - /* - * We rely on the L1 portion of the header below, so - * it's invalid for this header to have been evicted out - * of the ghost cache, prior to being written out. The - * ARC_FLAG_L2_WRITING bit ensures this won't happen. - */ - ASSERT(HDR_HAS_L1HDR(hdr)); - - ASSERT3U(HDR_GET_PSIZE(hdr), >, 0); - ASSERT3P(hdr->b_l1hdr.b_pdata, !=, NULL); - ASSERT3U(arc_hdr_size(hdr), >, 0); - uint64_t size = arc_hdr_size(hdr); - uint64_t asize = vdev_psize_to_asize(dev->l2ad_vdev, - size); - (void) refcount_add_many(&dev->l2ad_alloc, size, hdr); /* -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3dae7691-fcd1-b3b9-445c-b81d6f0cdc52>