Date: Sat, 19 Jul 2014 18:41:13 +0200 From: Kristof Provost <kristof@sigsegv.be> To: freebsd-fs@freebsd.org Subject: Re: ZFS panic on zvol resize Message-ID: <20140719164113.GA2406@vega.codepro.be> In-Reply-To: <20140704194750.GU75721@vega.codepro.be> References: <20140704194750.GU75721@vega.codepro.be>
next in thread | previous in thread | raw e-mail | index | archive | help
I've poked at this a bit more, and I think I understand the problem now. zvol_set_volsize() takes a hold on the file system with dmu_objset_hold() and then verifies that it's not marked as read-only. It does this through dsl_prop_get_integer() which also tries to take a hold on the file system with dmu_objset_hold(). That triggers the assert in dsl_pool_hold(). I don't think it'd be safe to move the read only check so it's done before taking the dmu_objset_hold(). I think it'd open us to races where the file system gets marked as read only after our check, but before the completion of the resize. A possible fix would be to use the unlocked variant of dsl_prop_get_integer() like this: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c (revision 268871) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c (working copy) @@ -913,8 +913,8 @@ doi.doi_data_block_size)) != 0) goto out; - VERIFY(dsl_prop_get_integer(name, "readonly", &readonly, - NULL) == 0); + VERIFY(dsl_prop_get_ds(dmu_objset_ds(os), "readonly", + 8, 1, &readonly, NULL) == 0); if (readonly) { error = EROFS; goto out; The ZFS on Linux people ran into the same problem: https://github.com/zfsonlinux/zfs/issues/2039 With this patch I no longer see the original panic, but I get a shiny new one in its place: panic: solaris assert: txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1279 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01212e94e0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe01212e9590 vpanic() at vpanic+0x126/frame 0xfffffe01212e95d0 panic() at panic+0x43/frame 0xfffffe01212e9630 assfail() at assfail+0x1d/frame 0xfffffe01212e9640 dmu_tx_assign() at dmu_tx_assign+0xae/frame 0xfffffe01212e96d0 zvol_set_volsize() at zvol_set_volsize+0x1cf/frame 0xfffffe01212e9760 zfs_prop_set_special() at zfs_prop_set_special+0x2e2/frame 0xfffffe01212e97f0 zfs_set_prop_nvlist() at zfs_set_prop_nvlist+0x23f/frame 0xfffffe01212e9880 zfs_ioc_set_prop() at zfs_ioc_set_prop+0x106/frame 0xfffffe01212e98e0 zfsdev_ioctl() at zfsdev_ioctl+0x6ee/frame 0xfffffe01212e9990 devfs_ioctl_f() at devfs_ioctl_f+0xfb/frame 0xfffffe01212e99f0 kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe01212e9a50 sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe01212e9aa0 amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe01212e9bb0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01212e9bb0 The dmu_tx_assign() function is unhappy about being called with TXG_WAIT while the dsl_pool_config is locked. Regards, Kristof
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140719164113.GA2406>