Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Jul 2014 18:41:13 +0200
From:      Kristof Provost <kristof@sigsegv.be>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS panic on zvol resize
Message-ID:  <20140719164113.GA2406@vega.codepro.be>
In-Reply-To: <20140704194750.GU75721@vega.codepro.be>
References:  <20140704194750.GU75721@vega.codepro.be>

next in thread | previous in thread | raw e-mail | index | archive | help
I've poked at this a bit more, and I think I understand the problem now.

zvol_set_volsize() takes a hold on the file system with dmu_objset_hold()
and then verifies that it's not marked as read-only.
It does this through dsl_prop_get_integer() which also tries to take a
hold on the file system with dmu_objset_hold(). That triggers the assert
in dsl_pool_hold().

I don't think it'd be safe to move the read only check so it's done
before taking the dmu_objset_hold(). I think it'd open us to races where
the file system gets marked as read only after our check, but before the
completion of the resize.

A possible fix would be to use the unlocked variant of
dsl_prop_get_integer() like this:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c       (revision 268871)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c       (working copy)
@@ -913,8 +913,8 @@
            doi.doi_data_block_size)) != 0)
                goto out;
 
-       VERIFY(dsl_prop_get_integer(name, "readonly", &readonly,
-           NULL) == 0);
+       VERIFY(dsl_prop_get_ds(dmu_objset_ds(os), "readonly",
+           8, 1, &readonly, NULL) == 0);
        if (readonly) {
                error = EROFS;
                goto out;

The ZFS on Linux people ran into the same problem:
https://github.com/zfsonlinux/zfs/issues/2039

With this patch I no longer see the original panic, but I get a shiny
new one in its place:

panic: solaris assert: txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1279
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01212e94e0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe01212e9590
vpanic() at vpanic+0x126/frame 0xfffffe01212e95d0
panic() at panic+0x43/frame 0xfffffe01212e9630
assfail() at assfail+0x1d/frame 0xfffffe01212e9640
dmu_tx_assign() at dmu_tx_assign+0xae/frame 0xfffffe01212e96d0
zvol_set_volsize() at zvol_set_volsize+0x1cf/frame 0xfffffe01212e9760
zfs_prop_set_special() at zfs_prop_set_special+0x2e2/frame 0xfffffe01212e97f0
zfs_set_prop_nvlist() at zfs_set_prop_nvlist+0x23f/frame 0xfffffe01212e9880
zfs_ioc_set_prop() at zfs_ioc_set_prop+0x106/frame 0xfffffe01212e98e0
zfsdev_ioctl() at zfsdev_ioctl+0x6ee/frame 0xfffffe01212e9990
devfs_ioctl_f() at devfs_ioctl_f+0xfb/frame 0xfffffe01212e99f0
kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe01212e9a50
sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe01212e9aa0
amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe01212e9bb0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01212e9bb0

The dmu_tx_assign() function is unhappy about being called with TXG_WAIT
while the dsl_pool_config is locked.

Regards,
Kristof



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140719164113.GA2406>