From owner-freebsd-fs@FreeBSD.ORG Mon Jul 11 12:03:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2FA591065672 for ; Mon, 11 Jul 2011 12:03:36 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id E08268FC1A for ; Mon, 11 Jul 2011 12:03:35 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:To:Cc:Content-Type:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=k71EOFTqf+dKcop68bCCp3AcbEWDLVjqIFN4+VHHEdeSSiQkpHP4ESHkN0HVThp0d6rw395rglBTkOd+iocLr0b6OBUH5MB8YbbmXB7CxbTmEfzR05mxfT11pV/y/3Es; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1QgEbT-000MFb-Em for freebsd-fs@freebsd.org; Mon, 11 Jul 2011 12:24:43 +0100 Received: from vlan111.pact.srf.ac.uk ([193.37.225.200] helo=[10.0.111.134]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1QgEbS-000MFM-P5; Mon, 11 Jul 2011 12:24:43 +0100 From: Luke Marsden To: freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Date: Mon, 11 Jul 2011 12:25:41 +0100 Message-ID: <1310383541.30844.73.camel@behemoth> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: tech@hybrid-logic.co.uk Subject: ZFS bug in v28 - temporary clones are not automatically destroyed on error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2011 12:03:36 -0000 Hi all, I'm experiencing this bug on mm's ZFS v28 image from 19.06.2011 r222557M: cannot destroy 'hpool/hcfs/fs@snapshot': dataset already exists That is on a v4 formatted zfs filesystem on a v28 formatted pool, if I zfs upgrade the filesystem to v5 the error changes to "snapshot has dependent clones" (from memory) which is more informative but otherwise behaves the same. See: http://serverfault.com/questions/66414 http://opensolaris.org/jive/thread.jspa?messageID=484242&tstart=0 FreeBSD dev1.demo 8.2-RELEASE-p2 FreeBSD 8.2-RELEASE-p2 #3 r222557M: Thu Jun 16 23:58:02 CEST 2011 root@neo.vx.sk:/usr/obj/releng_8_2/sys/GENERIC amd64 I found this email relating to a fix pushed to OpenSolaris in 2009. http://mail.opensolaris.org/pipermail/onnv-notify/2009-August/010064.html Did this fix ever get merged into the FreeBSD source tree, or is this a more recent regression unrelated to the fix from 2009? Unfortunately the OpenSolaris bug database seems to have disappeared so I can't find the actual patch. The workaround - zdb -d poolname |grep % - followed by manually zfs destroying the offending stray clones, works apart from that zdb on a busy pool (one with active zfs replication happening on it) gives EIO about 90% of the time, and that is the case in our application, which makes the workaround time consuming and temporary. We are resorting to attempting to predict and destroy all the possible stray clone names whenever a zfs replication event completes (either a failure or a success) to stop later destroys of snapshots failing with "cannot destroy [snapshot]: dataset already exists". In our application we do occasionally abort replication events if they appear to not be making any progress by sending SIGTERM to the corresponding zfs send/recv processes. FWIW, we didn't see this in v14 or v15 (8.1 or 8.2 respectively). Does anyone know better how to predict what the clone names will be, so that I can refine our workaround until we have a patch for 8-STABLE? Currently we are predicting that all the snapshot names included in an incremental receive are potential "stray clones", but this does not seem to catch all of them. Also, the clones corresponding to a snapshot do not have the same name as the snapshot, it seems that perhaps it is the "parent" snapshot (i.e. the previous snapshot on which the next snapshot is based upon) that becomes the name of the stray clone. This is fine but we lose this information by "pruning" intermediate snapshots. Is there any way to interrogate ZFS for the names of any such clones prior to attempting the destroy of the snapshot so that we can automatically clean them up more efficiently? Thank you all for your excellent work :-) -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Mobile: +447791750420 www.hybrid-cluster.com - Cloud web hosting platform