Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Jun 2013 09:11:20 +0000 (UTC)
From:      Xin LI <delphij@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r251520 - in head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs: . sys
Message-ID:  <201306080911.r589BKJe084890@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: delphij
Date: Sat Jun  8 09:11:20 2013
New Revision: 251520
URL: http://svnweb.freebsd.org/changeset/base/251520

Log:
  MFV r251519:
  
   * Illumos ZFS issue #3805 arc shouldn't cache freed blocks
  
  Quote from the Illumos issue:
  
      ZFS should proactively evict freed blocks from the cache.
  
      Even though these freed blocks will never be used again, and thus
      will eventually be evicted, this causes us to use memory
      inefficiently for 2 reasons:
  
      1. A block that is freed has no chance of being accessed again, but
         will be kept in memory preferentially to a block that was accessed
         before it (and is thus older) but has not been freed and thus has
         at least some chance of being accessed again.
  
      2. We partition the ARC into several buckets:
         user data that has been accessed only once (MRU)
         metadata that has been accessed only once (MRU)
         user data that has been accessed more than once (MFU)
         metadata that has been accessed more than once (MFU)
  
      The user data vs metadata split is somewhat arbitrary, and the
      primary control on how much memory is used to cache data vs metadata
      is to simply try to keep the proportion the same as it has been in the
      past (each bucket "evicts against" itself).  The secondary control is
      to evict data before evicting metadata.
  
      Because of this bucketing, we may end up with one bucket mostly
      containing freed blocks that are very old, while another bucket has
      more recently accessed, still-allocated blocks.  Data in the useful
      bucket (with still-allocated blocks) may be evicted in preference to
      data in the useless bucket (with old, freed blocks).
  
      On dcenter, we saw that the MFU metadata bucket was 230MB, while the
      MFU data bucket was 27GB and the MRU metadata bucket was 256GB.
      However, the vast majority of data in the MRU metadata bucket (256GB)
      was freed blocks, and thus useless.  Meanwhile, the MFU metadata bucket
      (230MB) was constantly evicting useful blocks that will be soon needed.
  
      The problem of cache segmentation is a larger problem that needs more
      investigation.  However, if we stop caching freed blocks, it should
      reduce the impact of this more fundamental issue.
  
  MFC after:	2 weeks

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
Directory Properties:
  head/sys/cddl/contrib/opensolaris/   (props changed)

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Sat Jun  8 08:51:22 2013	(r251519)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Sat Jun  8 09:11:20 2013	(r251520)
@@ -3372,6 +3372,34 @@ arc_set_callback(arc_buf_t *buf, arc_evi
 }
 
 /*
+ * Notify the arc that a block was freed, and thus will never be used again.
+ */
+void
+arc_freed(spa_t *spa, const blkptr_t *bp)
+{
+	arc_buf_hdr_t *hdr;
+	kmutex_t *hash_lock;
+	uint64_t guid = spa_load_guid(spa);
+
+	hdr = buf_hash_find(guid, BP_IDENTITY(bp), BP_PHYSICAL_BIRTH(bp),
+	    &hash_lock);
+	if (hdr == NULL)
+		return;
+	if (HDR_BUF_AVAILABLE(hdr)) {
+		arc_buf_t *buf = hdr->b_buf;
+		add_reference(hdr, hash_lock, FTAG);
+		hdr->b_flags &= ~ARC_BUF_AVAILABLE;
+		mutex_exit(hash_lock);
+
+		arc_release(buf, FTAG);
+		(void) arc_buf_remove_ref(buf, FTAG);
+	} else {
+		mutex_exit(hash_lock);
+	}
+
+}
+
+/*
  * This is used by the DMU to let the ARC know that a buffer is
  * being evicted, so the ARC should clean up.  If this arc buf
  * is not yet in the evicted state, it will be put there.

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h	Sat Jun  8 08:51:22 2013	(r251519)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h	Sat Jun  8 09:11:20 2013	(r251520)
@@ -20,7 +20,7 @@
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2012 by Delphix. All rights reserved.
+ * Copyright (c) 2013 by Delphix. All rights reserved.
  * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
  */
 
@@ -110,6 +110,7 @@ zio_t *arc_write(zio_t *pio, spa_t *spa,
     blkptr_t *bp, arc_buf_t *buf, boolean_t l2arc, boolean_t l2arc_compress,
     const zio_prop_t *zp, arc_done_func_t *ready, arc_done_func_t *done,
     void *priv, int priority, int zio_flags, const zbookmark_t *zb);
+void arc_freed(spa_t *spa, const blkptr_t *bp);
 
 void arc_set_callback(arc_buf_t *buf, arc_evict_func_t *func, void *priv);
 int arc_buf_evict(arc_buf_t *buf);

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Sat Jun  8 08:51:22 2013	(r251519)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Sat Jun  8 09:11:20 2013	(r251520)
@@ -780,6 +780,7 @@ zio_free_sync(zio_t *pio, spa_t *spa, ui
 	ASSERT(spa_sync_pass(spa) < zfs_sync_pass_deferred_free);
 
 	metaslab_check_free(spa, bp);
+	arc_freed(spa, bp);
 
 	zio = zio_create(pio, spa, txg, bp, NULL, size,
 	    NULL, NULL, ZIO_TYPE_FREE, ZIO_PRIORITY_FREE, flags,



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201306080911.r589BKJe084890>