From owner-freebsd-fs@FreeBSD.ORG Thu Aug 4 11:00:26 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95F29106564A for ; Thu, 4 Aug 2011 11:00:26 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6F34D8FC0C for ; Thu, 4 Aug 2011 11:00:26 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p74B0Qq7011678 for ; Thu, 4 Aug 2011 11:00:26 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p74B0Qur011677; Thu, 4 Aug 2011 11:00:26 GMT (envelope-from gnats) Date: Thu, 4 Aug 2011 11:00:26 GMT Message-Id: <201108041100.p74B0Qur011677@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Borja Marcos Cc: Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Borja Marcos List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2011 11:00:26 -0000 The following reply was made to PR kern/157728; it has been noted by GNATS. From: Borja Marcos To: bug-followup@FreeBSD.org, mm@FreeBSD.org Cc: Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones Date: Thu, 4 Aug 2011 12:39:58 +0200 I have a clue. I've tried a partial fix and so far seems to work. Now I = have a loop doing zfs sends of a dataset with a make buildworld = running, each 30 seconds, and receiving them onto a different pool, on = which I have a while ( 1 ) ; zfs list ; end loop running. So far I haven't had issues. The only side effect is that temporary = datasets can appear in the zfs list output.=20 Read below for the explanation. After reading Martin's analysis, seemed quite clear to me that the = scenario was due to the necessity of getting a consistent snapshot of = the state of a complex data structure. In this case, I imagined that the = "list" service would traverse the data structures holding the datasets = descriptions, and that it would place temporary locks on the elements in = order to prevent them from being altered while the structure is being = traversed. So, a generic "list" service in a fine-grained locking environment and = rendering a consistent response would be something like that: - traverse data structure, building a list. (each time we get an element, a temporary lock is placed on it) - get next element, etc. - With the complete and consistent list ready, prepare the response. - Once the response has been built, traverse the grabbed results and = release the locks. So, where's the problem? In the special treatment of the "hidden" = datasets. Looking at = /usr/src/sys/cddl/contrib/opensolaris/common/fs/zfs/zfs_ioctl.c, at the = function zfs_ioc_dataset_list_next(zfs_cmd_t *zc) I see something resembling this idea: while (error =3D=3D 0 && dataset_name_hidden(zc->zc_name) && !(zc->zc_iflags & FKIOCTL)); dmu_objset_rele(os, FTAG); So, wondering if the problem is this, giving a special treatment to the = hidden dataset, I've edited the dataset_name_hidden() function so that = it ignores the "%" datasets. boolean_t dataset_name_hidden(const char *name) { /* * Skip over datasets that are not visible in this zone, * internal datasets (which have a $ in their name), and * temporary datasets (which have a % in their name). */ if (strchr(name, '$') !=3D NULL) return (B_TRUE); /* if (strchr(name, '%') !=3D NULL) return (B_TRUE); */ if (!INGLOBALZONE(curthread) && !zone_dataset_visible(name, = NULL)) return (B_TRUE); return (B_FALSE); } =20 I was expecting just a side-effect: a "zfs list" would list the = "%"datasets. Done this, I've compiled the kernel, started the test again, and, voila! = it works. Of course, now I see the "%" datasets while the zfs receive is running, pruebazfs3# zfs list -t all NAME USED AVAIL REFER MOUNTPOINT rpool 1.22G 6.61G 41.3K /rpool rpool/newsrc 1.22G 6.61G 565M /rpool/newsrc rpool/newsrc@anteshidden 149M - 973M - rpool/newsrc@parcheteoria1 1.09M - 973M - rpool/newsrc@20110804_113700 0 - 565M - rpool/newsrc/%20110804_113730 1.31M 6.61G 566M = /rpool/newsrc/%20110804_113730 but after zfs receive finishes they are correctly cleaned up NAME USED AVAIL REFER MOUNTPOINT rpool 1.22G 6.61G 41.3K /rpool rpool/newsrc 1.22G 6.61G 566M /rpool/newsrc rpool/newsrc@anteshidden 149M - 973M - rpool/newsrc@parcheteoria1 1.09M - 973M - rpool/newsrc@20110804_113730 0 - 566M - So: Seems to me that these datasets are a sort of afterthought. The = ioctl "list" service should not discard them when building the dataset = list. Instead it should not "print" them, so to speak. I'm sure this temporary fix can be refined, and I'm wondering if a = similar issue is lurking somewhere else.... Borja.