From owner-freebsd-fs@FreeBSD.ORG Sat Jul 19 16:41:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0AE3662C for ; Sat, 19 Jul 2014 16:41:18 +0000 (UTC) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C3A032A96 for ; Sat, 19 Jul 2014 16:41:17 +0000 (UTC) Received: from vega.codepro.be (unknown [172.16.1.3]) by venus.codepro.be (Postfix) with ESMTP id 7E73EAD04 for ; Sat, 19 Jul 2014 18:41:13 +0200 (CEST) Received: by vega.codepro.be (Postfix, from userid 1001) id 75E1D19658; Sat, 19 Jul 2014 18:41:13 +0200 (CEST) Date: Sat, 19 Jul 2014 18:41:13 +0200 From: Kristof Provost To: freebsd-fs@freebsd.org Subject: Re: ZFS panic on zvol resize Message-ID: <20140719164113.GA2406@vega.codepro.be> References: <20140704194750.GU75721@vega.codepro.be> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140704194750.GU75721@vega.codepro.be> X-PGP-Fingerprint: E114 D9EA 909E D469 8F57 17A5 7D15 91C6 9EFA F286 X-Checked-By-NSA: Probably User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 16:41:18 -0000 I've poked at this a bit more, and I think I understand the problem now. zvol_set_volsize() takes a hold on the file system with dmu_objset_hold() and then verifies that it's not marked as read-only. It does this through dsl_prop_get_integer() which also tries to take a hold on the file system with dmu_objset_hold(). That triggers the assert in dsl_pool_hold(). I don't think it'd be safe to move the read only check so it's done before taking the dmu_objset_hold(). I think it'd open us to races where the file system gets marked as read only after our check, but before the completion of the resize. A possible fix would be to use the unlocked variant of dsl_prop_get_integer() like this: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c (revision 268871) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c (working copy) @@ -913,8 +913,8 @@ doi.doi_data_block_size)) != 0) goto out; - VERIFY(dsl_prop_get_integer(name, "readonly", &readonly, - NULL) == 0); + VERIFY(dsl_prop_get_ds(dmu_objset_ds(os), "readonly", + 8, 1, &readonly, NULL) == 0); if (readonly) { error = EROFS; goto out; The ZFS on Linux people ran into the same problem: https://github.com/zfsonlinux/zfs/issues/2039 With this patch I no longer see the original panic, but I get a shiny new one in its place: panic: solaris assert: txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1279 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01212e94e0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe01212e9590 vpanic() at vpanic+0x126/frame 0xfffffe01212e95d0 panic() at panic+0x43/frame 0xfffffe01212e9630 assfail() at assfail+0x1d/frame 0xfffffe01212e9640 dmu_tx_assign() at dmu_tx_assign+0xae/frame 0xfffffe01212e96d0 zvol_set_volsize() at zvol_set_volsize+0x1cf/frame 0xfffffe01212e9760 zfs_prop_set_special() at zfs_prop_set_special+0x2e2/frame 0xfffffe01212e97f0 zfs_set_prop_nvlist() at zfs_set_prop_nvlist+0x23f/frame 0xfffffe01212e9880 zfs_ioc_set_prop() at zfs_ioc_set_prop+0x106/frame 0xfffffe01212e98e0 zfsdev_ioctl() at zfsdev_ioctl+0x6ee/frame 0xfffffe01212e9990 devfs_ioctl_f() at devfs_ioctl_f+0xfb/frame 0xfffffe01212e99f0 kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe01212e9a50 sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe01212e9aa0 amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe01212e9bb0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01212e9bb0 The dmu_tx_assign() function is unhappy about being called with TXG_WAIT while the dsl_pool_config is locked. Regards, Kristof