From owner-freebsd-current@FreeBSD.ORG Sat Feb 9 14:46:03 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6E2B2573 for ; Sat, 9 Feb 2013 14:46:03 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay06.ispgateway.de (smtprelay06.ispgateway.de [80.67.31.103]) by mx1.freebsd.org (Postfix) with ESMTP id 05114DB0 for ; Sat, 9 Feb 2013 14:46:02 +0000 (UTC) Received: from [78.35.140.200] (helo=fabiankeil.de) by smtprelay06.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1U4Bgh-0006ll-IF for freebsd-current@freebsd.org; Sat, 09 Feb 2013 15:45:55 +0100 Date: Sat, 9 Feb 2013 15:42:10 +0100 From: Fabian Keil To: FreeBSD Current Subject: Destroying ZFS snapshots "too quickly": xpt_scan_lun: can't allocate CCB, can't continue Message-ID: <20130209154210.02af9e1e@fabiankeil.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/fTb7xoZWDGDAkZjIYqIG7KH"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2013 14:46:03 -0000 --Sig_/fTb7xoZWDGDAkZjIYqIG7KH Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Before the introduction of async_destroy I wrote a script to destroy ZFS snapshots in parallel to speed up the process. It's available at: http://www.fabiankeil.de/sourcecode/zfs-snapshot-destroyer/zsd.pl A couple of years ago the only downside seemed to be that it requires more memory and file descriptors (due to multiple zfs processes running at the same time) and that errors are ignored (implementation detail of the script). Recently I noticed that destroying several hundred (500) snapshots this way risks rendering the system unresponsive. I rarely do that, so it might not actually be a regression. When using X the screen freezes and keyboard input is ignored so it's hard to tell what's going on. When running the script on the console alt+Fx are often still accepted to switch consoles, but other keyboard input like entering commands or trying to login has no visible effect. A running top is killed and the system frequently logs: "xpt_scan_lun: can't allocate CCB, can't continue". Plugging in USB devices still result in the expected messages, but other than this the system seems to be unresponsive and doesn't recover (I only waited a couple of minutes, though). A "CCB" seems to be rather small: http://fxr.watson.org/fxr/source/cam/cam_xpt.c#L4386 therefore I suspect that ZFS got greedy and didn't play nice with the rest of the system. I have no proof that ZFS isn't merely triggering a problem in another subsystem, though. So far I haven't been able to reproduce the problem with snapshots intentionally created for testing, but I also used a somewhat simplistic approach to populate the snapshots. Is this considered a bug or is quickly destroying snapshots just something for the "don't do this" or "not without proper tuning" departments? I would also be interested to know if there's a way to somehow roughly figure out from userland how many snapshots can be safely destroyed in a row. Example: Look at "some" system state, destroy a safe amount of snapshots, look at "some" system state again and interpolate. Before top gets killed it usually shows that zfskern takes more than 50% WCPU, but this can also happen when the system doesn't become unresponsive and thus probably isn't a good metric (the delay also doesn't help of course). Fabian --Sig_/fTb7xoZWDGDAkZjIYqIG7KH Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEWYFAACgkQBYqIVf93VJ26AQCgyNSEu4olFRduLjVrDfIbi0Zp TgEAoKFbVhd3A+I109Bq1zFcFz48iDk6 =4Q/q -----END PGP SIGNATURE----- --Sig_/fTb7xoZWDGDAkZjIYqIG7KH--