Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Nov 2016 15:14:31 +0000 (UTC)
From:      Andriy Gapon <avg@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r309250 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Message-ID:  <201611281514.uASFEVkH028825@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: avg
Date: Mon Nov 28 15:14:31 2016
New Revision: 309250
URL: https://svnweb.freebsd.org/changeset/base/309250

Log:
  MFV r309249: 3821 Race in rollback, zil close, and zil flush
  
  Note: there was a merge conflict resolved by me.
  
  illumos/illumos-gate@43297f973a3543e7403ac27076490ab958a94b15
  https://github.com/illumos/illumos-gate/commit/43297f973a3543e7403ac27076490ab958a94b15
  
  https://www.illumos.org/issues/3821
    We recently had nodes with some of the latest zfs bits panic on us in a
    rollback-heavy environment. The following is from my preliminary analysis:
    Let's look at where we died:
    > $C
    ffffff01ea6b9a10 taskq_dispatch+0x3a(0, fffffffff7d20450, ffffff5551dea920, 1)
    ffffff01ea6b9a60 zil_clean+0xce(ffffff4b7106c080, 7e0f1)
    ffffff01ea6b9aa0 dsl_pool_sync_done+0x47(ffffff4313065680, 7e0f1)
    ffffff01ea6b9b70 spa_sync+0x55f(ffffff4310c1d040, 7e0f1)
    ffffff01ea6b9c20 txg_sync_thread+0x20f(ffffff4313065680)
    ffffff01ea6b9c30 thread_start+8()
    If we dig in we can find that this dataset corresponds to a zone:
    > ffffff4b7106c080::print zilog_t zl_os->os_dsl_dataset->ds_dir->dd_myname
    zl_os->os_dsl_dataset->ds_dir->dd_myname = [ "8ffce16a-13c2-4efa-a233-
    9e378e89877b" ]
    Okay so we have a null taskq pointer. That only happens during the calls to
    zil_open and zil_close. If we poke around we can see that we're actually in
    midst of a rollback:
    > ::pgrep zfs | ::printf "0x%x %s\\n" proc_t . p_user.u_psargs
    0xffffff43262800a0 zfs rollback zones/15714eb6-f5ea-469f-ac6d-
    4b8ab06213c2@marlin_init
    0xffffff54e22a1028 zfs rollback zones/8ffce16a-13c2-4efa-a233-
    9e378e89877b@marlin_init
    0xffffff4362f3a058 zfs rollback zones/0ddb8e49-ca7e-42e1-8fdc-
    4ac4ba8fe9f8@marlin_init
    0xffffff5748e8d020 zfs rollback zones/426357b5-832d-4430-953e-
    10cd45ff8e9f@marlin_init
    0xffffff436b867008 zfs rollback zones/8f36bf37-8a9c-4a44-995c-
    6d1b2751e6f5@marlin_init
    0xffffff4381ad4090 zfs rollback zones/6c8eca18-fbd6-46dd-ac24-
    2ed45cd0da70@marlin_init
  
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>
  Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
  Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
  Reviewed by: Andriy Gapon <avg@FreeBSD.org>
  Approved by: Richard Lowe <richlowe@richlowe.net>
  Author: George Wilson <george.wilson@delphix.com>
  
  MFC after:	3 weeks

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
Directory Properties:
  head/sys/cddl/contrib/opensolaris/   (props changed)

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Mon Nov 28 15:09:58 2016	(r309249)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Mon Nov 28 15:14:31 2016	(r309250)
@@ -20,7 +20,7 @@
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
+ * Copyright (c) 2011, 2016 by Delphix. All rights reserved.
  * Copyright (c) 2013 Steven Hartland. All rights reserved.
  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
@@ -681,9 +681,16 @@ dsl_pool_sync_done(dsl_pool_t *dp, uint6
 {
 	zilog_t *zilog;
 
-	while (zilog = txg_list_remove(&dp->dp_dirty_zilogs, txg)) {
+	while (zilog = txg_list_head(&dp->dp_dirty_zilogs, txg)) {
 		dsl_dataset_t *ds = dmu_objset_ds(zilog->zl_os);
+		/*
+		 * We don't remove the zilog from the dp_dirty_zilogs
+		 * list until after we've cleaned it. This ensures that
+		 * callers of zilog_is_dirty() receive an accurate
+		 * answer when they are racing with the spa sync thread.
+		 */
 		zil_clean(zilog, txg);
+		(void) txg_list_remove_this(&dp->dp_dirty_zilogs, zilog, txg);
 		ASSERT(!dmu_objset_is_dirty(zilog->zl_os, txg));
 		dmu_buf_rele(ds->ds_dbuf, zilog);
 	}

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	Mon Nov 28 15:09:58 2016	(r309249)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	Mon Nov 28 15:14:31 2016	(r309250)
@@ -7086,8 +7086,6 @@ spa_sync(spa_t *spa, uint64_t txg)
 		spa->spa_config_syncing = NULL;
 	}
 
-	spa->spa_ubsync = spa->spa_uberblock;
-
 	dsl_pool_sync_done(dp, txg);
 
 	mutex_enter(&spa->spa_alloc_lock);
@@ -7112,6 +7110,13 @@ spa_sync(spa_t *spa, uint64_t txg)
 
 	spa->spa_sync_pass = 0;
 
+	/*
+	 * Update the last synced uberblock here. We want to do this at
+	 * the end of spa_sync() so that consumers of spa_last_synced_txg()
+	 * will be guaranteed that all the processing associated with
+	 * that txg has been completed.
+	 */
+	spa->spa_ubsync = spa->spa_uberblock;
 	spa_config_exit(spa, SCL_CONFIG, FTAG);
 
 	spa_handle_ignored_writes(spa);

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	Mon Nov 28 15:09:58 2016	(r309249)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	Mon Nov 28 15:14:31 2016	(r309250)
@@ -20,8 +20,7 @@
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
- * Copyright (c) 2011, 2014 by Delphix. All rights reserved.
+ * Copyright (c) 2011, 2016 by Delphix. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
@@ -502,6 +501,27 @@ zilog_dirty(zilog_t *zilog, uint64_t txg
 	}
 }
 
+/*
+ * Determine if the zil is dirty in the specified txg. Callers wanting to
+ * ensure that the dirty state does not change must hold the itxg_lock for
+ * the specified txg. Holding the lock will ensure that the zil cannot be
+ * dirtied (zil_itx_assign) or cleaned (zil_clean) while we check its current
+ * state.
+ */
+boolean_t
+zilog_is_dirty_in_txg(zilog_t *zilog, uint64_t txg)
+{
+	dsl_pool_t *dp = zilog->zl_dmu_pool;
+
+	if (txg_list_member(&dp->dp_dirty_zilogs, zilog, txg & TXG_MASK))
+		return (B_TRUE);
+	return (B_FALSE);
+}
+
+/*
+ * Determine if the zil is dirty. The zil is considered dirty if it has
+ * any pending itx records that have not been cleaned by zil_clean().
+ */
 boolean_t
 zilog_is_dirty(zilog_t *zilog)
 {
@@ -1062,8 +1082,6 @@ zil_lwb_commit(zilog_t *zilog, itx_t *it
 		return (NULL);
 
 	ASSERT(lwb->lwb_buf != NULL);
-	ASSERT(zilog_is_dirty(zilog) ||
-	    spa_freeze_txg(zilog->zl_spa) != UINT64_MAX);
 
 	if (lrc->lrc_txtype == TX_WRITE && itx->itx_wr_state == WR_NEED_COPY)
 		dlen = P2ROUNDUP_TYPED(
@@ -1401,6 +1419,11 @@ zil_get_commit_list(zilog_t *zilog)
 	else
 		otxg = spa_last_synced_txg(zilog->zl_spa) + 1;
 
+	/*
+	 * This is inherently racy, since there is nothing to prevent
+	 * the last synced txg from changing. That's okay since we'll
+	 * only commit things in the future.
+	 */
 	for (txg = otxg; txg < (otxg + TXG_CONCURRENT_STATES); txg++) {
 		itxg_t *itxg = &zilog->zl_itxg[txg & TXG_MASK];
 
@@ -1410,6 +1433,16 @@ zil_get_commit_list(zilog_t *zilog)
 			continue;
 		}
 
+		/*
+		 * If we're adding itx records to the zl_itx_commit_list,
+		 * then the zil better be dirty in this "txg". We can assert
+		 * that here since we're holding the itxg_lock which will
+		 * prevent spa_sync from cleaning it. Once we add the itxs
+		 * to the zl_itx_commit_list we must commit it to disk even
+		 * if it's unnecessary (i.e. the txg was synced).
+		 */
+		ASSERT(zilog_is_dirty_in_txg(zilog, txg) ||
+		    spa_freeze_txg(zilog->zl_spa) != UINT64_MAX);
 		list_move_tail(commit_list, &itxg->itxg_itxs->i_sync_list);
 
 		mutex_exit(&itxg->itxg_lock);
@@ -1432,6 +1465,10 @@ zil_async_to_sync(zilog_t *zilog, uint64
 	else
 		otxg = spa_last_synced_txg(zilog->zl_spa) + 1;
 
+	/*
+	 * This is inherently racy, since there is nothing to prevent
+	 * the last synced txg from changing.
+	 */
 	for (txg = otxg; txg < (otxg + TXG_CONCURRENT_STATES); txg++) {
 		itxg_t *itxg = &zilog->zl_itxg[txg & TXG_MASK];
 
@@ -1503,8 +1540,14 @@ zil_commit_writer(zilog_t *zilog)
 	DTRACE_PROBE1(zil__cw1, zilog_t *, zilog);
 	while (itx = list_head(&zilog->zl_itx_commit_list)) {
 		txg = itx->itx_lr.lrc_txg;
-		ASSERT(txg);
+		ASSERT3U(txg, !=, 0);
 
+		/*
+		 * This is inherently racy and may result in us writing
+		 * out a log block for a txg that was just synced. This is
+		 * ok since we'll end cleaning up that log block the next
+		 * time we call zil_sync().
+		 */
 		if (txg > spa_last_synced_txg(spa) || txg > spa_freeze_txg(spa))
 			lwb = zil_lwb_commit(zilog, itx, lwb);
 		list_remove(&zilog->zl_itx_commit_list, itx);
@@ -1821,7 +1864,10 @@ zil_close(zilog_t *zilog)
 	mutex_exit(&zilog->zl_lock);
 	if (txg)
 		txg_wait_synced(zilog->zl_dmu_pool, txg);
-	ASSERT(!zilog_is_dirty(zilog));
+
+	if (zilog_is_dirty(zilog))
+		zfs_dbgmsg("zil (%p) is dirty, txg %llu", zilog, txg);
+	VERIFY(!zilog_is_dirty(zilog));
 
 	taskq_destroy(zilog->zl_clean_taskq);
 	zilog->zl_clean_taskq = NULL;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201611281514.uASFEVkH028825>