Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-fs@freebsd.org, avg@freebsd.org Subject: Re: kern/184677 / ZFS snapshot handling deadlocks Message-ID: <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu> In-Reply-To: <20131220134405.GE1658@garage.freebsd.pl> References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu> <20131220134405.GE1658@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
Dear Pawel, Thanks for your response. I hope someone will review my work. Secondly, I think I;ve found another deadlock scenario: When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock held. Then it does an unmount of all snapshots, which in turn goes to zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So it deadlocks. Currently I dont know how to workaround this. Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? What if we release it function enter, and re-acquire upon exit? Thanks in advance, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: > Date: Fri, 20 Dec 2013 14:44:05 +0100 > From: Pawel Jakub Dawidek <pjd@FreeBSD.org> > To: krichy@tvnetwork.hu > Cc: freebsd-fs@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >> Dear devs, >> >> I am attaching a more clear patch/fix for my snapshot handling issues >> (0002), but I would be happy if some ZFS expert would comment it. I am >> trying to solve it at least for two weeks now, and an ACK or a NACK would >> be nice from someone. Also a commit is reverted since it also caused >> deadlocks. I've read its comments, which also eliminates deadlocks, but I >> did not find any reference how to produce that deadlock. In my view >> reverting that makes my issues disappear, but I dont know what new cases >> will it rise. > > Richard, > > I won't be able to analyse it myself anytime soon, because of other > obligations, but I forwarded you e-mail to the zfs-devel@ mailing list > (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from > there will be able to help you. > >> I've rewritten traverse() to make more like upstream, added two extra >> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what >> was passed to it (I dont know even that upon creating a snapshot vnode why >> is that extra two holds needed for the vnode.) And unfortunately, due to >> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >> cause deadlocks, so zfsctl_snapshot_inactive() and >> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >> >> After this, one may also get a deadlock, when a simple access would call >> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >> should always be covered, but after some stress test, sometimes we hit >> that call, and that can cause again deadlocks. In our environment I've >> just uncommented that callback, which returns ENODIR on some calls, but at >> least not a deadlock. >> >> The attached script can be used to reproduce my cases (would one confirm >> that?), and after the patches applied, they disappear (confirm?). >> >> Thanks, >> >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: pjd@freebsd.org >>> Cc: freebsd-fs@freebsd.org >>> Subject: Re: kern/184677 (fwd) >>> >>> Dear devs, >>> >>> I will sum up my experience regarding the issue: >>> >>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>> snapshot dir (or in the snapshot) may cause a deadlock. >>> >>> After investigating the problem, I found that zfs send umounts the snapshots, >>> and that causes the deadlock, so later I tested only with concurrent umount >>> and the "activity". More later I found that listing the snapshots in >>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I used >>> them for the tests. But for my surprise, instead of a deadlock, a recursive >>> lock panic has arised. >>> >>> The vnode for the ".zfs/snapshot/" directory contains zfs's zfsctl_snapdir_t >>> structure (sdp). This contains a tree of mounted snapshots, and each entry >>> (sep) contains the vnode of entry on which the snapshot is mounted on top >>> (se_root). The strange is that the se_root member does not hold a reference >>> for the vnode, just a simple pointer to it. >>> >>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is locked, >>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>> exists already. If it founds no entry, does the mount. In the case of an >>> entry was found, the se_root member contains the vnode which the snapshot is >>> mounted on. Thus, a reference is taken for it, and the traverse() call will >>> resolve to the real root vnode of the mounted snapshot, returning it as >>> locked. (Examining the traverse() code I've found that it did not follow >>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>> >>> On the other way, when an umount is issued, the se_root vnode looses its last >>> reference (as only the mountpoint holds one for it), it goes through the >>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is called >>> with a locked vnode, so this is a deadlock race condition. While >>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() >>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock on >>> se_root while tries to access the sdp lock. >>> >>> The zfsctl_snapshot_inactive() has an if statement checking the v_usecount, >>> which is incremented in zfsctl_snapdir_lookup(), but in that context it is >>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() path >>> assumes that the vnode remains inactive (as opposed to illumos, at least how >>> i read the code). So zfsctl_snapshot_inactive() must free resources while in >>> a locked state. I was a bit confused, and probably that is why the previously >>> posted patch is as is. >>> >>> Maybe if I had some clues on the directions of this problem, I could have >>> worked more for a nicer, shorter solution. >>> >>> Please someone comment on my post. >>> >>> Regards, >>> >>> >>> >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>> >>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: pjd@freebsd.org >>>> Cc: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 (fwd) >>>> >>>> Dear PJD, >>>> >>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>> regarding some issues in ZFS. Please give some advice for me, where to look >>>> for solutions, or how could I help to resolve those issues. >>>> >>>> Regards, >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> ---------- Forwarded message ---------- >>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 >>>> >>>> >>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on Fri >>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>> situation. Any comments on this? >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>> >>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: freebsd-fs@freebsd.org >>>>> Subject: kern/184677 >>>>> >>>>> Dear devs, >>>>> >>>>> I've attached a patch, which makes the recursive lockmgr disappear, and >>>>> makes the reported bug to disappear. I dont know if I followed any >>>>> guidelines well, or not, but at least it works for me. Please some >>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>> >>>>> But unfortunately, my original problem is still not solved, maybe the same >>>>> as Ryan's: >>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>> >>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to acquire >>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>> deadlock scenario. I looked at illumos's code, and for some reason they >>>>> use another mutex on zfsdev_close(), which therefore may not deadlock with >>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>> >>>>> I would like to help making ZFS more stable on freebsd also with its whole >>>>> functionality. I would be very thankful if some expert would give some >>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>> >>>>> Thanks in advance, >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>> > > >> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >> From: Richard Kojedzinszky <krichy@cflinux.hu> >> Date: Mon, 16 Dec 2013 15:39:11 +0100 >> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and >> replace it with the" >> >> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >> >> Conflicts: >> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> --- >> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 +++++++++------------ >> 4 files changed, 70 insertions(+), 80 deletions(-) >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> index af2def2..8272c4d 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >> extern minor_t zfsdev_minor_alloc(void); >> >> extern void *zfsdev_state; >> +extern kmutex_t zfsdev_state_lock; >> >> #endif /* _KERNEL */ >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> index 15685a5..5c3e9f3 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> struct g_provider *pp; >> struct g_consumer *cp; >> size_t bufsize; >> - int error; >> + int error, lock; >> >> /* >> * We must have a pathname, and it must be absolute. >> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> >> vd->vdev_tsd = NULL; >> >> + if (mutex_owned(&spa_namespace_lock)) { >> + mutex_exit(&spa_namespace_lock); >> + lock = 1; >> + } else { >> + lock = 0; >> + } >> DROP_GIANT(); >> g_topology_lock(); >> error = 0; >> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> !ISP2(cp->provider->sectorsize)) { >> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >> vd->vdev_path); >> + >> + g_topology_lock(); >> vdev_geom_detach(cp, 0); >> + g_topology_unlock(); >> + >> error = EINVAL; >> cp = NULL; >> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> } >> g_topology_unlock(); >> PICKUP_GIANT(); >> + if (lock) >> + mutex_enter(&spa_namespace_lock); >> if (cp == NULL) { >> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >> return (error); >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> index e9fba26..91becde 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >> static minor_t last_minor; >> minor_t m; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> for (m = last_minor + 1; m != last_minor; m++) { >> if (m > ZFSDEV_MAX_MINOR) >> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >> minor_t minor; >> zfs_soft_state_t *zs; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> minor = zfsdev_minor_alloc(); >> if (minor == 0) >> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >> static void >> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >> { >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> zfs_onexit_destroy(zo); >> ddi_soft_state_free(zfsdev_state, minor); >> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, struct thread *td) >> >> /* This is the control device. Allocate a new minor if requested. */ >> if (flag & FEXCL) { >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> error = zfs_ctldev_init(devp); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } >> >> return (error); >> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >> if (minor == 0) >> return; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >> if (zo == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return; >> } >> zfs_ctldev_destroy(zo, minor); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } >> >> static int >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> index 72d4502..aec5219 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >> #define ZVOL_DUMPSIZE "dumpsize" >> >> /* >> - * The spa_namespace_lock protects the zfsdev_state structure from being >> - * modified while it's being used, e.g. an open that comes in before a >> - * create finishes. It also protects temporary opens of the dataset so that, >> + * This lock protects the zfsdev_state structure from being modified >> + * while it's being used, e.g. an open that comes in before a create >> + * finishes. It also protects temporary opens of the dataset so that, >> * e.g., an open doesn't get a spurious EBUSY. >> */ >> +kmutex_t zfsdev_state_lock; >> static uint32_t zvol_minors; >> >> typedef struct zvol_extent { >> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >> struct g_geom *gp; >> zvol_state_t *zv = NULL; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> g_topology_lock(); >> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >> { >> zvol_state_t *zv; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zv = zvol_minor_lookup(name); >> if (minor && zv) >> *minor = zv->zv_minor; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (zv ? 0 : -1); >> } >> #endif /* sun */ >> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >> >> ZFS_LOG(1, "Creating ZVOL %s...", name); >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> >> if (zvol_minor_lookup(name) != NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EEXIST)); >> } >> >> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >> >> if (error) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> #ifdef sun >> if ((minor = zfsdev_minor_alloc()) == 0) { >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >> ddi_soft_state_free(zfsdev_state, minor); >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> >> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >> ddi_remove_minor_node(zfs_dip, chrbuf); >> ddi_soft_state_free(zfsdev_state, minor); >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> >> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >> >> zvol_minors++; >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> zvol_geom_run(zv); >> >> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >> minor_t minor = zv->zv_minor; >> #endif >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> if (zv->zv_total_opens != 0) >> return (SET_ERROR(EBUSY)); >> >> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >> zvol_state_t *zv; >> int rc; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> if ((zv = zvol_minor_lookup(name)) == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> g_topology_lock(); >> rc = zvol_remove_zv(zv); >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (rc); >> } >> >> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >> dmu_tx_t *tx; >> int error; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> tx = dmu_tx_create(os); >> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >> namelen = strlen(name); >> >> DROP_GIANT(); >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> g_topology_lock(); >> >> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >> } >> >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> PICKUP_GIANT(); >> } >> >> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize) >> uint64_t old_volsize = 0ULL; >> uint64_t readonly; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zv = zvol_minor_lookup(name); >> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize) >> out: >> dmu_objset_rele(os, FTAG); >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> return (error); >> } >> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int count) >> { >> zvol_state_t *zv; >> int err = 0; >> - boolean_t locked = B_FALSE; >> >> - /* >> - * Protect against recursively entering spa_namespace_lock >> - * when spa_open() is used for a pool on a (local) ZVOL(s). >> - * This is needed since we replaced upstream zfsdev_state_lock >> - * with spa_namespace_lock in the ZVOL code. >> - * We are using the same trick as spa_open(). >> - * Note that calls in zvol_first_open which need to resolve >> - * pool name to a spa object will enter spa_open() >> - * recursively, but that function already has all the >> - * necessary protection. >> - */ >> - if (!MUTEX_HELD(&spa_namespace_lock)) { >> - mutex_enter(&spa_namespace_lock); >> - locked = B_TRUE; >> - } >> + mutex_enter(&zfsdev_state_lock); >> >> zv = pp->private; >> if (zv == NULL) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> if (zv->zv_total_opens == 0) >> err = zvol_first_open(zv); >> if (err) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (err); >> } >> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int count) >> #endif >> >> zv->zv_total_opens += count; >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> return (err); >> out: >> if (zv->zv_total_opens == 0) >> zvol_last_close(zv); >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (err); >> } >> >> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int count) >> { >> zvol_state_t *zv; >> int error = 0; >> - boolean_t locked = B_FALSE; >> >> - /* See comment in zvol_open(). */ >> - if (!MUTEX_HELD(&spa_namespace_lock)) { >> - mutex_enter(&spa_namespace_lock); >> - locked = B_TRUE; >> - } >> + mutex_enter(&zfsdev_state_lock); >> >> zv = pp->private; >> if (zv == NULL) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int count) >> if (zv->zv_total_opens == 0) >> zvol_last_close(zv); >> >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> int error = 0; >> rl_t *rl; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> >> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >> >> if (zv == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> ASSERT(zv->zv_total_opens > 0); >> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> dki.dki_ctype = DKC_UNKNOWN; >> dki.dki_unit = getminor(dev); >> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - zv->zv_min_bs); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >> error = SET_ERROR(EFAULT); >> return (error); >> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> dkm.dki_lbsize = 1U << zv->zv_min_bs; >> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >> dkm.dki_media_type = DK_UNKNOWN; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >> error = SET_ERROR(EFAULT); >> return (error); >> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> uint64_t vs = zv->zv_volsize; >> uint8_t bs = zv->zv_min_bs; >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> error = zvol_getefi((void *)arg, flag, vs, bs); >> return (error); >> } >> >> case DKIOCFLUSHWRITECACHE: >> dkc = (struct dk_callback *)arg; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> zil_commit(zv->zv_zilog, ZVOL_OBJ); >> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> } >> if (wce) { >> zv->zv_flags |= ZVOL_WCE; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } else { >> zv->zv_flags &= ~ZVOL_WCE; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> zil_commit(zv->zv_zilog, ZVOL_OBJ); >> } >> return (0); >> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> break; >> >> } >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> #endif /* sun */ >> @@ -1844,12 +1819,14 @@ zvol_init(void) >> { >> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >> 1) == 0); >> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >> ZFS_LOG(1, "ZVOL Initialized."); >> } >> >> void >> zvol_fini(void) >> { >> + mutex_destroy(&zfsdev_state_lock); >> ddi_soft_state_fini(&zfsdev_state); >> ZFS_LOG(1, "ZVOL Deinitialized."); >> } >> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >> uint64_t version = spa_version(spa); >> enum zio_checksum checksum; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> ASSERT(vd->vdev_ops == &vdev_root_ops); >> >> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char *newname) >> struct g_provider *pp; >> zvol_state_t *zv; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> g_topology_assert(); >> >> pp = LIST_FIRST(&gp->provider); >> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char *newname) >> newnamelen = strlen(newname); >> >> DROP_GIANT(); >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> g_topology_lock(); >> >> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char *newname) >> } >> >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> PICKUP_GIANT(); >> } >> -- >> 1.8.4.2 >> > >> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >> From: Richard Kojedzinszky <krichy@cflinux.hu> >> Date: Wed, 18 Dec 2013 22:11:21 +0100 >> Subject: [PATCH 2/2] ZFS snapshot handling fix >> >> --- >> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 +++++++++++++++------- >> 2 files changed, 42 insertions(+), 24 deletions(-) >> >> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> index 94383d6..4cac053 100644 >> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >> * progress on this vnode. >> */ >> >> + vn_lock(cvp, lktype); >> + >> for (;;) { >> /* >> * Reached the end of the mount chain? >> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >> if (vfsp == NULL) >> break; >> error = vfs_busy(vfsp, 0); >> - /* >> - * tvp is NULL for *cvpp vnode, which we can't unlock. >> - */ >> - if (tvp != NULL) >> - vput(cvp); >> - else >> - vrele(cvp); >> + VOP_UNLOCK(cvp, 0); >> if (error) >> return (error); >> >> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >> vfs_unbusy(vfsp); >> if (error != 0) >> return (error); >> + >> + VN_RELE(cvp); >> + >> cvp = tvp; >> } >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> index 28ab1fa..d3464b4 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >> return (0); >> } >> >> +/* Return the zfsctl_snapdir_t object from current vnode, following >> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >> + * On return the passed in vp is unlocked */ >> +static zfsctl_snapdir_t * >> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >> +{ >> + gfs_dir_t *dp = vp->v_data; >> + *dvpp = dp->gfsd_file.gfs_parent; >> + zfsctl_snapdir_t *sdp; >> + >> + VN_HOLD(*dvpp); >> + VOP_UNLOCK(vp, 0); >> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >> + sdp = (*dvpp)->v_data; >> + VOP_UNLOCK(*dvpp, 0); >> + >> + return (sdp); >> +} >> + >> #ifdef sun >> vnodeops_t *zfsctl_ops_root; >> vnodeops_t *zfsctl_ops_snapdir; >> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >> * The snapshot was unmounted behind our backs, >> * try to remount it. >> */ >> + VOP_UNLOCK(*vpp, 0); >> + VN_HOLD(*vpp); >> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, snapname) == 0); >> goto domount; >> } else { >> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >> (void) strcpy(sep->se_name, nm); >> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, dmu_objset_id(snap)); >> - VN_HOLD(*vpp); >> avl_insert(&sdp->sd_snaps, sep, where); >> >> dmu_objset_rele(snap, FTAG); >> @@ -1075,6 +1095,7 @@ domount: >> (void) snprintf(mountpoint, mountpoint_len, >> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >> + VN_HOLD(*vpp); >> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >> kmem_free(mountpoint, mountpoint_len); >> if (err == 0) { >> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >> int locked; >> vnode_t *dvp; >> >> - if (vp->v_count > 0) >> - goto end; >> - >> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >> - sdp = dvp->v_data; >> - VOP_UNLOCK(dvp, 0); >> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >> >> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >> mutex_enter(&sdp->sd_lock); >> >> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >> + >> + if (vp->v_count > 0) { >> + mutex_exit(&sdp->sd_lock); >> + return (0); >> + } >> + >> ASSERT(!vn_ismntpt(vp)); >> >> sep = avl_first(&sdp->sd_snaps); >> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >> mutex_exit(&sdp->sd_lock); >> VN_RELE(dvp); >> >> -end: >> /* >> * Dispose of the vnode for the snapshot mount point. >> * This is safe to do because once this entry has been removed >> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >> static int >> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >> { >> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >> - vnode_t *dvp, *vp; >> + vnode_t *dvp, *vp = ap->a_vp; >> zfsctl_snapdir_t *sdp; >> zfs_snapentry_t *sep; >> - int error; >> + int error = 0; >> >> - ASSERT(zfsvfs->z_ctldir != NULL); >> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >> - if (error != 0) >> - return (error); >> - sdp = dvp->v_data; >> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >> >> mutex_enter(&sdp->sd_lock); >> + >> + vn_lock(vp, LK_SHARED | LK_RETRY); >> + >> sep = avl_first(&sdp->sd_snaps); >> while (sep != NULL) { >> vp = sep->se_root; >> -- >> 1.8.4.2 >> > > -- > Pawel Jakub Dawidek http://www.wheelsystems.com > FreeBSD committer http://www.FreeBSD.org > Am I Evil? Yes, I Am! http://mobter.com >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.DEB.2.10.1312231750030.31822>