Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Jan 2015 20:20:49 +0000
From:      Steven Hartland <steven@multiplay.co.uk>
To:        Xin Li <delphij@delphij.net>, d@delphij.net,  src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r276123 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
Message-ID:  <54A5AC21.5070802@multiplay.co.uk>
In-Reply-To: <54A3ACEF.70905@delphij.net>
References:  <201412230931.sBN9VPMK017968@svn.freebsd.org> <54A35B88.9090102@delphij.net> <54A39153.8040905@freebsd.org> <54A3ACEF.70905@delphij.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------070607090801080704000405
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit


On 31/12/2014 07:59, Xin Li wrote:
> Hi, Steven,
>
> On 12/30/14 22:01, Steven Hartland wrote:
>> On 31/12/2014 02:12, Xin Li wrote:
>>> On 12/23/14 01:31, Steven Hartland wrote:
>>>> Author: smh
>>>> Date: Tue Dec 23 09:31:24 2014
>>>> New Revision: 276123
>>>> URL: https://svnweb.freebsd.org/changeset/base/276123
>>>>
>>>> Log:
>>>>     Always sync the global ZFS config cache to reflect the new mosconfig
>>>>        This fixes out of date zpool.cache for root pools, which can
>>>> cause issues
>>>>     such as confusion of zdb etc.
>>>>        MFC after:    1 month
>>>>
>>>> Modified:
>>>>     head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
>>>>
>>>> Modified:
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
>>>> ==============================================================================
>>>>
>>>> ---
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
>>>> Tue Dec 23 08:51:30 2014    (r276122)
>>>> +++
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
>>>> Tue Dec 23 09:31:24 2014    (r276123)
>>>> @@ -536,8 +536,7 @@ spa_config_update(spa_t *spa, int what)
>>>>        /*
>>>>         * Update the global config cache to reflect the new mosconfig.
>>>>         */
>>>> -    if (!spa->spa_is_root)
>>>> -        spa_config_sync(spa, B_FALSE, what != SPA_CONFIG_UPDATE_POOL);
>>>> +    spa_config_sync(spa, B_FALSE, what != SPA_CONFIG_UPDATE_POOL);
>>>>          if (what == SPA_CONFIG_UPDATE_POOL)
>>>>            spa_config_update(spa, SPA_CONFIG_UPDATE_VDEVS);
>>> It seems like that this change breaks systems where not all pools are
>>> available (e.g. some of pools are encrypted) at boot time, by removing
>>> all these pools from the cache file.
>>>
>>> As a result, on the next boot, these pools would not be imported even
>>> when the devices are available (geli attached), and reverting this
>>> change would restore the system to its previous behavior.
>>>
>>> Perhaps it have exposed an existing bug?
>> I've managed to reproduce this here with mdX backed test pools, and
>> looking into it this is due to spa_config_sync:
>>
>>                          /*
>>                           * Skip over our own pool if we're about to remove
>>                           * ourselves from the spa namespace or any pool
>> that
>>                           * is readonly. Since we cannot guarantee that a
>>                           * readonly pool would successfully import upon
>> reboot,
>>                           * we don't allow them to be written to the
>> cache file.
>>                           */
>>                          if ((spa == target && removing) ||
>>                              !spa_writeable(spa)) {
>>                                  continue;
>>                          }
>>
>> This was added by upstream:
>> https://github.com/illumos/illumos-gate/commit/fb02ae025247e3b662600e5a9c1b4c33ecab7d72
>>
>> https://www.illumos.org/issues/3639
>>
>> It seems the desired behavior was to exclude read only pools, to prevent
>> them being mounted writable on reboot, but the check is too wide and
>> also excludes unavailable pools too.
>>
>> The attached patch corrects the test to only check writable on active
>> pools so I believe should fix the issue your seeing.
>>
>> If you can test it and lmk that would be great.
> This is an improvement (the attaching of pool is now working) but the
> cached configuration is somewhat corrupted, it shows many lines of
> vdev_stats[] contents.
>
> I'll take a look at the code tomorrow and see if I can do some
> instruments and possibly figure out the underlying issue.
>
> Cheers,
Looks like the spa_config_sync makes the assumption that the last call 
to spa_config_generate was done with getstats = 0.

This is now an invalid assumption so we need to ensure that stats are 
removed from the config nvl before persisting to disk.

The attached patch, which supersedes the last one I sent, fixes that in 
my tests here.

Let me know if it fixes your test case too?

     Regards
     Steve

--------------070607090801080704000405
Content-Type: text/plain; charset=windows-1252;
 name="zpool-cache.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="zpool-cache.patch"

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c	(revision 276123)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c	(working copy)
@@ -140,6 +140,26 @@ out:
 	kobj_close_file(file);
 }
 
+static void
+spa_config_clean(nvlist_t *nvl)
+{
+	nvlist_t **child;
+	nvlist_t *nvroot = NULL;
+	uint_t c, children;
+
+	if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_CHILDREN, &child,
+	    &children) == 0) {
+		for (c = 0; c < children; c++)
+			spa_config_clean(child[c]);
+	}
+
+	if (nvlist_lookup_nvlist(nvl, ZPOOL_CONFIG_VDEV_TREE, &nvroot) == 0)
+		spa_config_clean(nvroot);
+
+	nvlist_remove(nvl, ZPOOL_CONFIG_VDEV_STATS, DATA_TYPE_UINT64_ARRAY);
+	nvlist_remove(nvl, ZPOOL_CONFIG_SCAN_STATS, DATA_TYPE_UINT64_ARRAY);
+}
+
 static int
 spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl)
 {
@@ -233,6 +253,7 @@ spa_config_sync(spa_t *target, boolean_t removing,
 		 */
 		nvl = NULL;
 		while ((spa = spa_next(spa)) != NULL) {
+			nvlist_t *nvroot = NULL;
 			/*
 			 * Skip over our own pool if we're about to remove
 			 * ourselves from the spa namespace or any pool that
@@ -241,7 +262,8 @@ spa_config_sync(spa_t *target, boolean_t removing,
 			 * we don't allow them to be written to the cache file.
 			 */
 			if ((spa == target && removing) ||
-			    !spa_writeable(spa))
+			    (spa_state(spa) == POOL_STATE_ACTIVE &&
+			    !spa_writeable(spa)))
 				continue;
 
 			mutex_enter(&spa->spa_props_lock);
@@ -260,6 +282,9 @@ spa_config_sync(spa_t *target, boolean_t removing,
 			VERIFY(nvlist_add_nvlist(nvl, spa->spa_name,
 			    spa->spa_config) == 0);
 			mutex_exit(&spa->spa_props_lock);
+
+			if (nvlist_lookup_nvlist(nvl, spa->spa_name, &nvroot) == 0)
+				spa_config_clean(nvroot);
 		}
 
 		error = spa_config_write(dp, nvl);

--------------070607090801080704000405--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54A5AC21.5070802>