From owner-freebsd-fs@FreeBSD.ORG Mon Feb 28 00:05:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C3531065673 for ; Mon, 28 Feb 2011 00:05:11 +0000 (UTC) (envelope-from royce.williams@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 03F658FC1C for ; Mon, 28 Feb 2011 00:05:10 +0000 (UTC) Received: by iyj12 with SMTP id 12so2786275iyj.13 for ; Sun, 27 Feb 2011 16:05:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=IUplPnyBe5ghF5oV0yD/TELy/dq3sLIJOlROukjcrfk=; b=Gn6Cocd2taZUfMux4QXfNdGo1JBuuOSaiI3eWA/JAqdCC1SgCCuW9a6O7kMVAc1D9U AA5HBOtMgMxNYF0pc3Nfo77MtDnGO7QbeWrOKMyL5fxnNN72H8X52oUfjZJWnrdhMx5D r8DucGekxpqOaRjZ/U2poetyx3MYHnrUGbKI8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=hWtQzPNxY4HBv9U4D7U9FP2Zc3WvgDHIkYf/Qah2wXBvJVnZgvCN7C4HQOUgKQAXPi 41qfyGRIKNTKl87LsXtjrFhlpwIJjh+9RC29RB2BxrUjrh3SJXKjovM+vUrPQfkekF22 CvDdJ1aixgNQkhbA3Dm39NomOnlNP7NHDW+nI= Received: by 10.42.224.136 with SMTP id io8mr4167289icb.202.1298851510089; Sun, 27 Feb 2011 16:05:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.183.139 with HTTP; Sun, 27 Feb 2011 16:04:50 -0800 (PST) In-Reply-To: References: From: Royce Williams Date: Sun, 27 Feb 2011 15:04:50 -0900 Message-ID: To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: slow "zfs destroy snapshot" with predictable time pattern X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Feb 2011 00:05:11 -0000 On Sun, Feb 27, 2011 at 9:30 AM, Royce Williams wrote: > I thought that 'zfs snapshot destroy' should be fast (on the order of > a few seconds), but mine are taking a predictably long time on a > pretty modest filesystem (details below). > > I discovered this when a typo caused many more snapshots than I > intended (every minute!); I had about 12,000 of them before I noticed. > =A0Destroying the first snapshot took about 39 wallclock seconds on an > otherwise idle system. =A0A few more destroys took almost exactly the > same amount of time. > > I know little about ZFS under the hood, but I wanted to investigate a > little bit. I scripted a loop of 'time zfs destroy snapshot' and let > it run overnight. =A0Each destroy was consistently taking 37-40 seconds, > but then after hundreds of deletions in that time range, I saw a > jagged spike, followed by a consistent drop that has stayed in the > 23-25s range: > > [hours of 38-39s destroys snipped] > > real =A0 =A00m38.205s > real =A0 =A00m38.455s > real =A0 =A00m38.580s > real =A0 =A00m37.414s > real =A0 =A00m35.330s <-- small drop here > real =A0 =A00m35.347s > real =A0 =A00m35.380s > real =A0 =A00m35.355s > real =A0 =A00m35.255s > real =A0 =A00m35.514s > real =A0 =A00m35.422s > real =A0 =A00m35.464s > real =A0 =A00m46.121s =A0 <-- small spike here > real =A0 =A00m44.630s > real =A0 =A00m46.021s > real =A0 =A01m19.443s =A0 <-- big spike here > real =A0 =A00m40.896s > real =A0 =A00m22.848s =A0<-- drop into the 20s range > real =A0 =A00m29.039s > real =A0 =A00m29.831s > real =A0 =A00m26.348s > real =A0 =A00m22.623s > real =A0 =A00m29.314s > real =A0 =A00m29.589s > real =A0 =A00m26.573s > real =A0 =A00m22.773s > > [hours of of 23-25s destroys snipped] > > I know very little about ZFS under the hood, but this model might fit the= facts: > > * Normally, 'zfs destroy snapshot' is fast (on the order of a few seconds= ); > > * 'zfs destroy snapshot' has to briefly analyze all snapshots prior to > destruction; > > * A particular 'problem' snapshot can slow that full analysis by a > consistent amount of time; > > * Destroying that 'problem' snapshot drops the analysis time by that amou= nt. > > > If my model is correct, I'm going to see one or more spikes, followed > by corresponding drops, until the destroys return to a reasonable > rate. > > This guy had a problem that might also fit that model -- that > particular snapshots can be very slow, and removing them removes the > time delay. That thread notes that it was due to a low-memory > condition, and OpenSolaris bug 6542681 was filed for it. =A0I do not > think that my problem is because of low memory. > > =A0 =A0http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg07647.h= tml > > > I have stopped the destroys in case the remaining 'problem' snapshot is u= seful. > > The system is 8.1-SECURITY, amd64, 4GB RAM, no sysctl or loader > tweaks, ZFS v3, zpool v14, single 58GB ZFS pool. > > # zfs list > NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0USED =A0AVAIL =A0REFER =A0MOUNTPOINT > atoz-backup =A0 =A0 =A015.4G =A058.0G =A025.5K =A0/atoz-backup > atoz-backup/usr =A015.3G =A058.0G =A014.8G =A0/atoz-backup/usr > > # df -ki | egrep 'atoz|Filesystem' > Filesystem =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 10= 24-blocks =A0 =A0 Used > Avail Capacity iused =A0 =A0 ifree %iused =A0Mounted on > atoz-backup =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 60789979 =A0 =A0 =A0 25 > 60789953 =A0 =A0 0% =A0 =A0 =A0 6 121579907 =A0 =A00% =A0 /atoz-backup > atoz-backup/usr =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 7= 6281655 15491701 > 60789953 =A0 =A020% =A0714124 121579907 =A0 =A01% =A0 /atoz-backup/usr I left this message open-ended. I also left out the hardware info. What I meant to get at was: * Is ~35 seconds roughly normal for a "zfs destroy snapshot" on a filesystem with 723K files, single disk, under no other I/O or CPU load, with an Intel ICH9 SATA300 talking to a Seagate ST3160812AS (SATA 3)? Googling seems to indicate that it's not, even if I had 12,000 snapshots. * If this speed is not normal, does this smell like a bug? If so, I can help recreate or test, maybe on 8.2 or 9-CURRENT. * If this speed is normal, what search terms would have yielded that fact? (slow OR speed OR problem OR performance) plus "zfs destroy snapshot" didn't yield much that applied. * Is my model valid? I reasoned that if a snapshot is a separate set of references to all files in the filesystem, and the filesystem hasn't changed much over time, then destroying each snapshot should take a relatively constant amount of time. Even if my time expectation was wrong (that it would be quite a bit shorter, even for 723,000 files), I would not expect the destroy time to drop by 50% in the middle of the destroy series ... so I think that I'm missing something. * If my model isn't valid, what is a better model? For example, I had gotten down to just over 10,000 snapshots when I stopped the series of destroys; could there be a snapshot count threshold, under which each destroy now takes less time, in stair-step fashion? In other words: I'm trying to understand what I'm seeing. :-) Also, sorry for the bad subject line - "zfs destroy snapshot" is obviously the correct command; I've updated the subject line. Royce