Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Aug 2011 12:40:11 GMT
From:      Martin Matuska <mm@FreeBSD.org>
To:        freebsd-fs@FreeBSD.org
Subject:   Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones
Message-ID:  <201108041240.p74CeBxw008141@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/157728; it has been noted by GNATS.

From: Martin Matuska <mm@FreeBSD.org>
To: Borja Marcos <borjam@sarenet.es>
Cc: bug-followup@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind
 temporary clones
Date: Thu, 04 Aug 2011 14:33:45 +0200

 This is a multi-part message in MIME format.
 --------------090908080609040403050308
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 That is not a solution, we want hidden datasets :)
 
 A workaround patch is attached that does not prefetch hidden datasets in
 zfs (btw. why should we do that at all).
 It doesn't cure the source of the problem but the symptoms - to
 reproduce the problem you have to run zfs list or get directly on the
 invisible temporary clone now.
 
 Please test.
 
 Dňa 04.08.2011 12:39, Borja Marcos wrote / napísal(a):
 > I have a clue. I've tried a partial fix and so far seems to work. Now I have a loop doing zfs sends of a dataset with a make buildworld  running, each 30 seconds, and receiving them onto a different pool, on which I have a while ( 1 ) ; zfs list ; end loop running.
 >
 > So far I haven't had issues. The only side effect is that temporary datasets can appear in the zfs list output. 
 >
 > Read below for the explanation.
 >
 > After reading Martin's analysis, seemed quite clear to me that the scenario was due to the necessity of getting a consistent snapshot of the state of a complex data structure. In this case, I imagined that the "list" service would traverse the data structures holding the datasets descriptions, and that it would place temporary locks on the elements in order to prevent them from being altered while the structure is being traversed.
 >
 > So, a generic "list" service in a fine-grained locking environment and rendering a consistent response would be something like that:
 >
 > - traverse data structure, building a list.
 >   (each time we get an element, a temporary lock is placed on it)
 > - get next element, etc.
 >
 > - With the complete and consistent list ready, prepare the response.
 >
 > - Once the response has been built, traverse the grabbed results and release the locks.
 >
 >
 > So, where's the problem? In the special treatment of the "hidden" datasets.
 >
 > Looking at /usr/src/sys/cddl/contrib/opensolaris/common/fs/zfs/zfs_ioctl.c, at the function zfs_ioc_dataset_list_next(zfs_cmd_t *zc)
 >
 > I see something resembling this idea:
 >
 > while (error == 0 && dataset_name_hidden(zc->zc_name) &&
 >             !(zc->zc_iflags & FKIOCTL));
 >         dmu_objset_rele(os, FTAG);
 >
 > So, wondering if the problem is this, giving a special treatment to the hidden dataset, I've edited the dataset_name_hidden() function so that it ignores the "%" datasets.
 >
 > boolean_t
 > dataset_name_hidden(const char *name)
 > {
 >         /*
 >          * Skip over datasets that are not visible in this zone,
 >          * internal datasets (which have a $ in their name), and
 >          * temporary datasets (which have a % in their name).
 >          */
 >         if (strchr(name, '$') != NULL)
 >                 return (B_TRUE);
 > /*      if (strchr(name, '%') != NULL)
 >                 return (B_TRUE); */
 >         if (!INGLOBALZONE(curthread) && !zone_dataset_visible(name, NULL))
 >                 return (B_TRUE);
 >         return (B_FALSE);
 > }
 >                 
 >
 > I was expecting just a side-effect: a "zfs list" would list the "%"datasets.
 >
 > Done this, I've compiled the kernel, started the test again, and, voila! it works.
 >
 > Of course, now I see the "%" datasets while the zfs receive is running,
 >
 > pruebazfs3# zfs list -t all
 > NAME                            USED  AVAIL  REFER  MOUNTPOINT
 > rpool                          1.22G  6.61G  41.3K  /rpool
 > rpool/newsrc                   1.22G  6.61G   565M  /rpool/newsrc
 > rpool/newsrc@anteshidden        149M      -   973M  -
 > rpool/newsrc@parcheteoria1     1.09M      -   973M  -
 > rpool/newsrc@20110804_113700       0      -   565M  -
 > rpool/newsrc/%20110804_113730  1.31M  6.61G   566M  /rpool/newsrc/%20110804_113730
 >
 >
 > but after zfs receive finishes they are correctly cleaned up
 >
 > NAME                           USED  AVAIL  REFER  MOUNTPOINT
 > rpool                         1.22G  6.61G  41.3K  /rpool
 > rpool/newsrc                  1.22G  6.61G   566M  /rpool/newsrc
 > rpool/newsrc@anteshidden       149M      -   973M  -
 > rpool/newsrc@parcheteoria1    1.09M      -   973M  -
 > rpool/newsrc@20110804_113730      0      -   566M  -
 >
 >
 > So: Seems to me that these datasets are a sort of afterthought. The ioctl "list" service should not discard them when building the dataset list. Instead it should not "print" them, so to speak.
 >
 > I'm sure this temporary fix can be refined, and I'm wondering if a similar issue is lurking somewhere else....
 -- 
 Martin Matuska
 FreeBSD committer
 http://blog.vx.sk
 
 
 --------------090908080609040403050308
 Content-Type: text/x-patch;
  name="zfs_ioctl.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="zfs_ioctl.c.patch"
 
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
 ===================================================================
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	(revision 224648)
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c	(working copy)
 @@ -1963,8 +1963,13 @@ zfs_ioc_dataset_list_next()
  		uint64_t cookie = 0;
  		int len = sizeof (zc->zc_name) - (p - zc->zc_name);
  
 -		while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0)
 -			(void) dmu_objset_prefetch(zc->zc_name, NULL);
 +		while (dmu_dir_list_next(os, len, p, NULL,
 +		    &cookie) == 0) {
 +			if (dataset_name_hidden(zc->zc_name) == B_FALSE) {
 +				(void) dmu_objset_prefetch(zc->zc_name,
 +				    NULL);
 +			}
 +		}
  	}
  
  	do {
 
 --------------090908080609040403050308--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201108041240.p74CeBxw008141>