From owner-freebsd-fs@FreeBSD.ORG  Sat Jul 19 16:41:18 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0AE3662C
 for <freebsd-fs@freebsd.org>; Sat, 19 Jul 2014 16:41:18 +0000 (UTC)
Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits))
 (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C3A032A96
 for <freebsd-fs@freebsd.org>; Sat, 19 Jul 2014 16:41:17 +0000 (UTC)
Received: from vega.codepro.be (unknown [172.16.1.3])
 by venus.codepro.be (Postfix) with ESMTP id 7E73EAD04
 for <freebsd-fs@freebsd.org>; Sat, 19 Jul 2014 18:41:13 +0200 (CEST)
Received: by vega.codepro.be (Postfix, from userid 1001)
 id 75E1D19658; Sat, 19 Jul 2014 18:41:13 +0200 (CEST)
Date: Sat, 19 Jul 2014 18:41:13 +0200
From: Kristof Provost <kristof@sigsegv.be>
To: freebsd-fs@freebsd.org
Subject: Re: ZFS panic on zvol resize
Message-ID: <20140719164113.GA2406@vega.codepro.be>
References: <20140704194750.GU75721@vega.codepro.be>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20140704194750.GU75721@vega.codepro.be>
X-PGP-Fingerprint: E114 D9EA 909E D469 8F57  17A5 7D15 91C6 9EFA F286
X-Checked-By-NSA: Probably
User-Agent: Mutt/1.5.23 (2014-03-12)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 16:41:18 -0000

I've poked at this a bit more, and I think I understand the problem now.

zvol_set_volsize() takes a hold on the file system with dmu_objset_hold()
and then verifies that it's not marked as read-only.
It does this through dsl_prop_get_integer() which also tries to take a
hold on the file system with dmu_objset_hold(). That triggers the assert
in dsl_pool_hold().

I don't think it'd be safe to move the read only check so it's done
before taking the dmu_objset_hold(). I think it'd open us to races where
the file system gets marked as read only after our check, but before the
completion of the resize.

A possible fix would be to use the unlocked variant of
dsl_prop_get_integer() like this:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c       (revision 268871)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c       (working copy)
@@ -913,8 +913,8 @@
            doi.doi_data_block_size)) != 0)
                goto out;
 
-       VERIFY(dsl_prop_get_integer(name, "readonly", &readonly,
-           NULL) == 0);
+       VERIFY(dsl_prop_get_ds(dmu_objset_ds(os), "readonly",
+           8, 1, &readonly, NULL) == 0);
        if (readonly) {
                error = EROFS;
                goto out;

The ZFS on Linux people ran into the same problem:
https://github.com/zfsonlinux/zfs/issues/2039

With this patch I no longer see the original panic, but I get a shiny
new one in its place:

panic: solaris assert: txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1279
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01212e94e0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe01212e9590
vpanic() at vpanic+0x126/frame 0xfffffe01212e95d0
panic() at panic+0x43/frame 0xfffffe01212e9630
assfail() at assfail+0x1d/frame 0xfffffe01212e9640
dmu_tx_assign() at dmu_tx_assign+0xae/frame 0xfffffe01212e96d0
zvol_set_volsize() at zvol_set_volsize+0x1cf/frame 0xfffffe01212e9760
zfs_prop_set_special() at zfs_prop_set_special+0x2e2/frame 0xfffffe01212e97f0
zfs_set_prop_nvlist() at zfs_set_prop_nvlist+0x23f/frame 0xfffffe01212e9880
zfs_ioc_set_prop() at zfs_ioc_set_prop+0x106/frame 0xfffffe01212e98e0
zfsdev_ioctl() at zfsdev_ioctl+0x6ee/frame 0xfffffe01212e9990
devfs_ioctl_f() at devfs_ioctl_f+0xfb/frame 0xfffffe01212e99f0
kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe01212e9a50
sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe01212e9aa0
amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe01212e9bb0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01212e9bb0

The dmu_tx_assign() function is unhappy about being called with TXG_WAIT
while the dsl_pool_config is locked.

Regards,
Kristof