From owner-svn-src-all@freebsd.org Tue Jan 19 20:52:33 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2507EA89C7B; Tue, 19 Jan 2016 20:52:33 +0000 (UTC) (envelope-from lidl@pix.net) Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254:10::4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.pix.net", Issuer "Pix.Com Technologies, LLC CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D9F3C129C; Tue, 19 Jan 2016 20:52:32 +0000 (UTC) (envelope-from lidl@pix.net) Received: from torb.pix.net ([IPv6:2001:470:e254:11:52:fed0:33f2:51b0]) (authenticated bits=0) by hydra.pix.net (8.15.2/8.15.2) with ESMTPA id u0JKqNaQ046732; Tue, 19 Jan 2016 15:52:30 -0500 (EST) (envelope-from lidl@pix.net) X-Authentication-Warning: hydra.pix.net: Host [IPv6:2001:470:e254:11:52:fed0:33f2:51b0] claimed to be torb.pix.net Subject: Re: svn commit: r294329 - in head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs: . sys To: Alan Somers References: <201601191700.u0JH0P6k061610@repo.freebsd.org> Cc: "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" From: Kurt Lidl Message-ID: <569EA207.5010304@pix.net> Date: Tue, 19 Jan 2016 15:52:23 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jan 2016 20:52:33 -0000 On 1/19/16 1:55 PM, Alan Somers wrote: > On Tue, Jan 19, 2016 at 10:00 AM, Alan Somers wrote: >> Author: asomers >> Date: Tue Jan 19 17:00:25 2016 >> New Revision: 294329 >> URL: https://svnweb.freebsd.org/changeset/base/294329 >> >> Log: >> Disallow zvol-backed ZFS pools >> >> Using zvols as backing devices for ZFS pools is fraught with panics and >> deadlocks. For example, attempting to online a missing device in the >> presence of a zvol can cause a panic when vdev_geom tastes the zvol. Better >> to completely disable vdev_geom from ever opening a zvol. The solution >> relies on setting a thread-local variable during vdev_geom_open, and >> returning EOPNOTSUPP during zvol_open if that thread-local variable is set. >> >> Remove the check for MUTEX_HELD(&zfsdev_state_lock) in zvol_open. Its intent >> was to prevent a recursive mutex acquisition panic. However, the new check >> for the thread-local variable also fixes that problem. >> >> Also, fix a panic in vdev_geom_taste_orphan. For an unknown reason, this >> function was set to panic. But it can occur that a device disappears during >> tasting, and it causes no problems to ignore this departure. >> >> Reviewed by: delphij >> MFC after: 1 week >> Relnotes: yes >> Sponsored by: Spectra Logic Corp >> Differential Revision: https://reviews.freebsd.org/D4986 >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> >> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h > > Due to popular demand, I will conditionalize this behavior on a > sysctl, and I won't MFC it. The sysctl must default to off (ZFS on > zvols not allowed) because having the ability to put pools on zvols > can cause panics even for users who aren't using it. Thank you! > And let me clear up some confusion: > > 1) Having the ability to put a zpool on a zvol can cause panics and > deadlocks, even if that ability is unused. > 2) Putting a zpool atop a zvol causes unnecessary performance problems > because there are two layers of COW involved, with all their software > complexities. This also applies to putting a zpool atop files on a > ZFS filesystem. > 3) A VM guest putting a zpool on its virtual disk, where the VM host > backs that virtual disk with a zvol, will work fine. That's the ideal > use case for zvols. > 3b) Using ZFS on both host and guest isn't ideal for performance, as > described in item 2. That's why I prefer to use UFS for VM guests. The patch as is does very much break the way some people do operations on zvols. My script that does virtual machine cloning via snapshots of zvols containing zpools is currently broken due to this. (I upgraded one of my dev hosts right after your commit, to verify the broken behavior.) In my script, I boot an auto-install .iso into bhyve: bhyve -c 2 -m ${vmmem} -H -A -I -g 0 \ -s 0:0,hostbridge \ -s 1,lpc -l com1,stdio \ -s 2:0,virtio-net,${template_tap} \ -s 3:0,ahci-hd,"${zvol}" \ -s 4:0,ahci-cd,"${isofile}" \ ${vmname} || \ echo "trapped error exit from bhyve: $?" So, yes, the zpool gets created by the client VM. Then on the hypervisor host, the script imports that zpool and renames it, so that I can have different pool names for all the client VMs. This step now fails: + zpool import -R /virt/base -d /dev/zvol/zdata sys base cannot import 'sys' as 'base': no such pool or dataset Destroy and re-create the pool from a backup source. I import the clients' zpools after the zpools on them has been renamed, so the hypervisor host can manipulate the files directly. It only disturbs a small amount of the disk blocks on each of the snapshots of the zvol to rename the zpools. In this way, I can instantiate ~30 virtual machines from a custom install.iso image in less than 3 minutes. And the bulk of that time is doing the installation from the custom install.iso into the first virtual machine. The cloning of the zvols, and manipulation of the resulting filesystems is very fast. -Kurt