Date: Wed, 25 Dec 2013 06:22:37 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-fs@freebsd.org, avg@freebsd.org Subject: Re: kern/184677 / ZFS snapshot handling deadlocks Message-ID: <alpine.DEB.2.10.1312250614001.2785@krichy.tvnetwork.hu> In-Reply-To: <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu> References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu> <20131220134405.GE1658@garage.freebsd.pl> <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --1030603365-641758273-1387948957=:2785 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Dear all, I've made a new patch again, which fixes most of my earlier issues. Mainly, I've enabled shared vnode locks for GFS vnodes - is it acceptable? And that way, deadlock cases reduced much, and also the patch is more clear (at least to me). One thing still remains, the spa_namespace_lock race I mentioned before, I dont know how to handle that. Waiting for comments on this. Regards, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek <pjd@FreeBSD.org> > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear all, > > Now a user caused deadlock simply caused buildkernel to be blocked, so I must > correct myself in that this not only does a DoS for .zfs/snapshot > directories, but may for the whole system. > > Regards, > > P.S.: Merry Christmas! > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear all, >> >> Unfortunately, this seems to be serious, as a plain user can deadlock zfs >> using the attached script, and can make a DoS against the mounted dataset's >> .zfs/snapshot directory. >> >> Regards, >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> >>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> Dear devs, >>> >>> A third one, again the snapshot dirs: >>> >>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & >>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null >>> >>> Will deadlock in a few seconds. >>> >>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries >>> to lock .zfs, while the other just tries to do this in reversed order. >>> >>> In a vop_lookup, why is the passed in vnode, and the returned vnode must >>> be locked? Why is not enough to have it held? >>> >>> Thanks in advance, >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: >>> >>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> >>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>> >>>> Dear Pawel, >>>> >>>> Thanks for your response. I hope someone will review my work. >>>> >>>> Secondly, I think I;ve found another deadlock scenario: >>>> >>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >>>> held. Then it does an unmount of all snapshots, which in turn goes to >>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >>>> >>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock >>>> does a snapshot mount, which somewhere tries to acquire >>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to >>>> workaround this. >>>> >>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >>>> >>>> What if we release it function enter, and re-acquire upon exit? >>>> >>>> Thanks in advance, >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >>>> >>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org> >>>>> To: krichy@tvnetwork.hu >>>>> Cc: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>>> >>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>>>> Dear devs, >>>>>> >>>>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK >>>>>> would >>>>>> be nice from someone. Also a commit is reverted since it also caused >>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but >>>>>> I >>>>>> did not find any reference how to produce that deadlock. In my view >>>>>> reverting that makes my issues disappear, but I dont know what new >>>>>> cases >>>>>> will it rise. >>>>> >>>>> Richard, >>>>> >>>>> I won't be able to analyse it myself anytime soon, because of other >>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>>>> there will be able to help you. >>>>> >>>>>> I've rewritten traverse() to make more like upstream, added two extra >>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode >>>>>> what >>>>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>>>> why >>>>>> is that extra two holds needed for the vnode.) And unfortunately, due >>>>>> to >>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could >>>>>> also >>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>>>> >>>>>> After this, one may also get a deadlock, when a simple access would >>>>>> call >>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those >>>>>> vnodes >>>>>> should always be covered, but after some stress test, sometimes we hit >>>>>> that call, and that can cause again deadlocks. In our environment I've >>>>>> just uncommented that callback, which returns ENODIR on some calls, but >>>>>> at >>>>>> least not a deadlock. >>>>>> >>>>>> The attached script can be used to reproduce my cases (would one >>>>>> confirm >>>>>> that?), and after the patches applied, they disappear (confirm?). >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: pjd@freebsd.org >>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>> >>>>>>> Dear devs, >>>>>>> >>>>>>> I will sum up my experience regarding the issue: >>>>>>> >>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on >>>>>>> the >>>>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>>>> >>>>>>> After investigating the problem, I found that zfs send umounts the >>>>>>> snapshots, >>>>>>> and that causes the deadlock, so later I tested only with concurrent >>>>>>> umount >>>>>>> and the "activity". More later I found that listing the snapshots in >>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>>>> used >>>>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>>>> recursive >>>>>>> lock panic has arised. >>>>>>> >>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>>>> zfsctl_snapdir_t >>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>>>> entry >>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>>>> top >>>>>>> (se_root). The strange is that the se_root member does not hold a >>>>>>> reference >>>>>>> for the vnode, just a simple pointer to it. >>>>>>> >>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>>>> locked, >>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if >>>>>>> it >>>>>>> exists already. If it founds no entry, does the mount. In the case of >>>>>>> an >>>>>>> entry was found, the se_root member contains the vnode which the >>>>>>> snapshot is >>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>>>> will >>>>>>> resolve to the real root vnode of the mounted snapshot, returning it >>>>>>> as >>>>>>> locked. (Examining the traverse() code I've found that it did not >>>>>>> follow >>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>>>> >>>>>>> On the other way, when an umount is issued, the se_root vnode looses >>>>>>> its last >>>>>>> reference (as only the mountpoint holds one for it), it goes through >>>>>>> the >>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>>>> called >>>>>>> with a locked vnode, so this is a deadlock race condition. While >>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and >>>>>>> traverse() >>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the >>>>>>> lock on >>>>>>> se_root while tries to access the sdp lock. >>>>>>> >>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>>>> v_usecount, >>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context >>>>>>> it is >>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>>>> path >>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at >>>>>>> least how >>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>>>> while in >>>>>>> a locked state. I was a bit confused, and probably that is why the >>>>>>> previously >>>>>>> posted patch is as is. >>>>>>> >>>>>>> Maybe if I had some clues on the directions of this problem, I could >>>>>>> have >>>>>>> worked more for a nicer, shorter solution. >>>>>>> >>>>>>> Please someone comment on my post. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>> >>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: pjd@freebsd.org >>>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>>> >>>>>>>> Dear PJD, >>>>>>>> >>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>>>> regarding some issues in ZFS. Please give some advice for me, where >>>>>>>> to look >>>>>>>> for solutions, or how could I help to resolve those issues. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>> Subject: Re: kern/184677 >>>>>>>> >>>>>>>> >>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>>>> Fri >>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>>>> situation. Any comments on this? >>>>>>>> >>>>>>>> >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> >>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>>> >>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>>>> From: krichy@tvnetwork.hu >>>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>>> Subject: kern/184677 >>>>>>>>> >>>>>>>>> Dear devs, >>>>>>>>> >>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>>>> and >>>>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>>>> >>>>>>>>> But unfortunately, my original problem is still not solved, maybe >>>>>>>>> the same >>>>>>>>> as Ryan's: >>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>>>> >>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>>>> acquire >>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes >>>>>>>>> a >>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>>>> they >>>>>>>>> use another mutex on zfsdev_close(), which therefore may not >>>>>>>>> deadlock with >>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>>>> >>>>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>>>> whole >>>>>>>>> functionality. I would be very thankful if some expert would give >>>>>>>>> some >>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> >>>>>>>>> Kojedzinszky Richard >>>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>>>> >>>>>>> >>>>> >>>>> >>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu> >>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely >>>>>> and >>>>>> replace it with the" >>>>>> >>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>>>> >>>>>> Conflicts: >>>>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> --- >>>>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>>>> +++++++++------------ >>>>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>>>> >>>>>> diff --git >>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> index af2def2..8272c4d 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>>>> extern minor_t zfsdev_minor_alloc(void); >>>>>> >>>>>> extern void *zfsdev_state; >>>>>> +extern kmutex_t zfsdev_state_lock; >>>>>> >>>>>> #endif /* _KERNEL */ >>>>>> >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> index 15685a5..5c3e9f3 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> struct g_provider *pp; >>>>>> struct g_consumer *cp; >>>>>> size_t bufsize; >>>>>> - int error; >>>>>> + int error, lock; >>>>>> >>>>>> /* >>>>>> * We must have a pathname, and it must be absolute. >>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> >>>>>> vd->vdev_tsd = NULL; >>>>>> >>>>>> + if (mutex_owned(&spa_namespace_lock)) { >>>>>> + mutex_exit(&spa_namespace_lock); >>>>>> + lock = 1; >>>>>> + } else { >>>>>> + lock = 0; >>>>>> + } >>>>>> DROP_GIANT(); >>>>>> g_topology_lock(); >>>>>> error = 0; >>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> !ISP2(cp->provider->sectorsize)) { >>>>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>>>> vd->vdev_path); >>>>>> + >>>>>> + g_topology_lock(); >>>>>> vdev_geom_detach(cp, 0); >>>>>> + g_topology_unlock(); >>>>>> + >>>>>> error = EINVAL; >>>>>> cp = NULL; >>>>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> } >>>>>> g_topology_unlock(); >>>>>> PICKUP_GIANT(); >>>>>> + if (lock) >>>>>> + mutex_enter(&spa_namespace_lock); >>>>>> if (cp == NULL) { >>>>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>>>> return (error); >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> index e9fba26..91becde 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>>>> static minor_t last_minor; >>>>>> minor_t m; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> for (m = last_minor + 1; m != last_minor; m++) { >>>>>> if (m > ZFSDEV_MAX_MINOR) >>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>>> minor_t minor; >>>>>> zfs_soft_state_t *zs; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> minor = zfsdev_minor_alloc(); >>>>>> if (minor == 0) >>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>>> static void >>>>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>>>> { >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> zfs_onexit_destroy(zo); >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int >>>>>> mode, struct thread *td) >>>>>> >>>>>> /* This is the control device. Allocate a new minor if requested. */ >>>>>> if (flag & FEXCL) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> error = zfs_ctldev_init(devp); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } >>>>>> >>>>>> return (error); >>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>>>> if (minor == 0) >>>>>> return; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>>>> if (zo == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return; >>>>>> } >>>>>> zfs_ctldev_destroy(zo, minor); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } >>>>>> >>>>>> static int >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> index 72d4502..aec5219 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>>>> #define ZVOL_DUMPSIZE "dumpsize" >>>>>> >>>>>> /* >>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from >>>>>> being >>>>>> - * modified while it's being used, e.g. an open that comes in before a >>>>>> - * create finishes. It also protects temporary opens of the dataset >>>>>> so that, >>>>>> + * This lock protects the zfsdev_state structure from being modified >>>>>> + * while it's being used, e.g. an open that comes in before a create >>>>>> + * finishes. It also protects temporary opens of the dataset so that, >>>>>> * e.g., an open doesn't get a spurious EBUSY. >>>>>> */ >>>>>> +kmutex_t zfsdev_state_lock; >>>>>> static uint32_t zvol_minors; >>>>>> >>>>>> typedef struct zvol_extent { >>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>>>> struct g_geom *gp; >>>>>> zvol_state_t *zv = NULL; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> g_topology_lock(); >>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zv = zvol_minor_lookup(name); >>>>>> if (minor && zv) >>>>>> *minor = zv->zv_minor; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (zv ? 0 : -1); >>>>>> } >>>>>> #endif /* sun */ >>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>>>> >>>>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> if (zvol_minor_lookup(name) != NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EEXIST)); >>>>>> } >>>>>> >>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>>>> >>>>>> if (error) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> #ifdef sun >>>>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> >>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> >>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>>>> >>>>>> zvol_minors++; >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> zvol_geom_run(zv); >>>>>> >>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>>>> minor_t minor = zv->zv_minor; >>>>>> #endif >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> if (zv->zv_total_opens != 0) >>>>>> return (SET_ERROR(EBUSY)); >>>>>> >>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>>>> zvol_state_t *zv; >>>>>> int rc; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> g_topology_lock(); >>>>>> rc = zvol_remove_zv(zv); >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (rc); >>>>>> } >>>>>> >>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>>>> dmu_tx_t *tx; >>>>>> int error; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> tx = dmu_tx_create(os); >>>>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>>>> namelen = strlen(name); >>>>>> >>>>>> DROP_GIANT(); >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> g_topology_lock(); >>>>>> >>>>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>>>> } >>>>>> >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> PICKUP_GIANT(); >>>>>> } >>>>>> >>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>>>> uint64_t volsize) >>>>>> uint64_t old_volsize = 0ULL; >>>>>> uint64_t readonly; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zv = zvol_minor_lookup(name); >>>>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>>>> uint64_t volsize) >>>>>> out: >>>>>> dmu_objset_rele(os, FTAG); >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> return (error); >>>>>> } >>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> int err = 0; >>>>>> - boolean_t locked = B_FALSE; >>>>>> >>>>>> - /* >>>>>> - * Protect against recursively entering spa_namespace_lock >>>>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>>>> - * with spa_namespace_lock in the ZVOL code. >>>>>> - * We are using the same trick as spa_open(). >>>>>> - * Note that calls in zvol_first_open which need to resolve >>>>>> - * pool name to a spa object will enter spa_open() >>>>>> - * recursively, but that function already has all the >>>>>> - * necessary protection. >>>>>> - */ >>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> - locked = B_TRUE; >>>>>> - } >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = pp->private; >>>>>> if (zv == NULL) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> if (zv->zv_total_opens == 0) >>>>>> err = zvol_first_open(zv); >>>>>> if (err) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (err); >>>>>> } >>>>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> #endif >>>>>> >>>>>> zv->zv_total_opens += count; >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> return (err); >>>>>> out: >>>>>> if (zv->zv_total_opens == 0) >>>>>> zvol_last_close(zv); >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (err); >>>>>> } >>>>>> >>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> int error = 0; >>>>>> - boolean_t locked = B_FALSE; >>>>>> >>>>>> - /* See comment in zvol_open(). */ >>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> - locked = B_TRUE; >>>>>> - } >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = pp->private; >>>>>> if (zv == NULL) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> if (zv->zv_total_opens == 0) >>>>>> zvol_last_close(zv); >>>>>> >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> int error = 0; >>>>>> rl_t *rl; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>>>> >>>>>> if (zv == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> ASSERT(zv->zv_total_opens > 0); >>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> dki.dki_ctype = DKC_UNKNOWN; >>>>>> dki.dki_unit = getminor(dev); >>>>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>>>> zv->zv_min_bs); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>>>> error = SET_ERROR(EFAULT); >>>>>> return (error); >>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>>>> dkm.dki_media_type = DK_UNKNOWN; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>>>> error = SET_ERROR(EFAULT); >>>>>> return (error); >>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> uint64_t vs = zv->zv_volsize; >>>>>> uint8_t bs = zv->zv_min_bs; >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> case DKIOCFLUSHWRITECACHE: >>>>>> dkc = (struct dk_callback *)arg; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> } >>>>>> if (wce) { >>>>>> zv->zv_flags |= ZVOL_WCE; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } else { >>>>>> zv->zv_flags &= ~ZVOL_WCE; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>>> } >>>>>> return (0); >>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> break; >>>>>> >>>>>> } >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> #endif /* sun */ >>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>>>> { >>>>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>>>> 1) == 0); >>>>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>>>> ZFS_LOG(1, "ZVOL Initialized."); >>>>>> } >>>>>> >>>>>> void >>>>>> zvol_fini(void) >>>>>> { >>>>>> + mutex_destroy(&zfsdev_state_lock); >>>>>> ddi_soft_state_fini(&zfsdev_state); >>>>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>>>> } >>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t >>>>>> resize) >>>>>> uint64_t version = spa_version(spa); >>>>>> enum zio_checksum checksum; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>>>> >>>>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>>>> *newname) >>>>>> struct g_provider *pp; >>>>>> zvol_state_t *zv; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> g_topology_assert(); >>>>>> >>>>>> pp = LIST_FIRST(&gp->provider); >>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const >>>>>> char *newname) >>>>>> newnamelen = strlen(newname); >>>>>> >>>>>> DROP_GIANT(); >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> g_topology_lock(); >>>>>> >>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const >>>>>> char *newname) >>>>>> } >>>>>> >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> PICKUP_GIANT(); >>>>>> } >>>>>> -- >>>>>> 1.8.4.2 >>>>>> >>>>> >>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu> >>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>>>> >>>>>> --- >>>>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>>>> +++++++++++++++------- >>>>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>>>> >>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> index 94383d6..4cac053 100644 >>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> * progress on this vnode. >>>>>> */ >>>>>> >>>>>> + vn_lock(cvp, lktype); >>>>>> + >>>>>> for (;;) { >>>>>> /* >>>>>> * Reached the end of the mount chain? >>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> if (vfsp == NULL) >>>>>> break; >>>>>> error = vfs_busy(vfsp, 0); >>>>>> - /* >>>>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>>>> - */ >>>>>> - if (tvp != NULL) >>>>>> - vput(cvp); >>>>>> - else >>>>>> - vrele(cvp); >>>>>> + VOP_UNLOCK(cvp, 0); >>>>>> if (error) >>>>>> return (error); >>>>>> >>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> vfs_unbusy(vfsp); >>>>>> if (error != 0) >>>>>> return (error); >>>>>> + >>>>>> + VN_RELE(cvp); >>>>>> + >>>>>> cvp = tvp; >>>>>> } >>>>>> >>>>>> diff --git >>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> index 28ab1fa..d3464b4 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>>>> return (0); >>>>>> } >>>>>> >>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>>>> + * On return the passed in vp is unlocked */ >>>>>> +static zfsctl_snapdir_t * >>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>>>> +{ >>>>>> + gfs_dir_t *dp = vp->v_data; >>>>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>>>> + zfsctl_snapdir_t *sdp; >>>>>> + >>>>>> + VN_HOLD(*dvpp); >>>>>> + VOP_UNLOCK(vp, 0); >>>>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>>>> + sdp = (*dvpp)->v_data; >>>>>> + VOP_UNLOCK(*dvpp, 0); >>>>>> + >>>>>> + return (sdp); >>>>>> +} >>>>>> + >>>>>> #ifdef sun >>>>>> vnodeops_t *zfsctl_ops_root; >>>>>> vnodeops_t *zfsctl_ops_snapdir; >>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>>>> * The snapshot was unmounted behind our backs, >>>>>> * try to remount it. >>>>>> */ >>>>>> + VOP_UNLOCK(*vpp, 0); >>>>>> + VN_HOLD(*vpp); >>>>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>>>> snapname) == 0); >>>>>> goto domount; >>>>>> } else { >>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>>>> (void) strcpy(sep->se_name, nm); >>>>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>>>> dmu_objset_id(snap)); >>>>>> - VN_HOLD(*vpp); >>>>>> avl_insert(&sdp->sd_snaps, sep, where); >>>>>> >>>>>> dmu_objset_rele(snap, FTAG); >>>>>> @@ -1075,6 +1095,7 @@ domount: >>>>>> (void) snprintf(mountpoint, mountpoint_len, >>>>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>>>> + VN_HOLD(*vpp); >>>>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>>>> kmem_free(mountpoint, mountpoint_len); >>>>>> if (err == 0) { >>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>>>> int locked; >>>>>> vnode_t *dvp; >>>>>> >>>>>> - if (vp->v_count > 0) >>>>>> - goto end; >>>>>> - >>>>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>>>> - sdp = dvp->v_data; >>>>>> - VOP_UNLOCK(dvp, 0); >>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>>> >>>>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>>>> mutex_enter(&sdp->sd_lock); >>>>>> >>>>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>>>> + >>>>>> + if (vp->v_count > 0) { >>>>>> + mutex_exit(&sdp->sd_lock); >>>>>> + return (0); >>>>>> + } >>>>>> + >>>>>> ASSERT(!vn_ismntpt(vp)); >>>>>> >>>>>> sep = avl_first(&sdp->sd_snaps); >>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>>>> mutex_exit(&sdp->sd_lock); >>>>>> VN_RELE(dvp); >>>>>> >>>>>> -end: >>>>>> /* >>>>>> * Dispose of the vnode for the snapshot mount point. >>>>>> * This is safe to do because once this entry has been removed >>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>>>> static int >>>>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>>>> { >>>>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>>>> - vnode_t *dvp, *vp; >>>>>> + vnode_t *dvp, *vp = ap->a_vp; >>>>>> zfsctl_snapdir_t *sdp; >>>>>> zfs_snapentry_t *sep; >>>>>> - int error; >>>>>> + int error = 0; >>>>>> >>>>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>>>> - if (error != 0) >>>>>> - return (error); >>>>>> - sdp = dvp->v_data; >>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>>> >>>>>> mutex_enter(&sdp->sd_lock); >>>>>> + >>>>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>>>> + >>>>>> sep = avl_first(&sdp->sd_snaps); >>>>>> while (sep != NULL) { >>>>>> vp = sep->se_root; >>>>>> -- >>>>>> 1.8.4.2 >>>>>> >>>>> >>>>> -- >>>>> Pawel Jakub Dawidek http://www.wheelsystems.com >>>>> FreeBSD committer http://www.FreeBSD.org >>>>> Am I Evil? Yes, I Am! http://mobter.com >>>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >> > --1030603365-641758273-1387948957=:2785 Content-Type: TEXT/x-diff; name=zfs-gfs.patch Content-Transfer-Encoding: BASE64 Content-ID: <alpine.DEB.2.10.1312250622370.2785@krichy.tvnetwork.hu> Content-Description: Content-Disposition: attachment; filename=zfs-gfs.patch Y29tbWl0IDQ3MmVkNGIyM2Y5ZDAxZmEwOGUwNmFhYmNiNjA0MDYyZmZlODY4 MWMNCkF1dGhvcjogUmljaGFyZCBLb2plZHppbnN6a3kgPGtyaWNoeUBjZmxp bnV4Lmh1Pg0KRGF0ZTogICBXZWQgRGVjIDI1IDA2OjA3OjQxIDIwMTMgKzAx MDANCg0KICAgIFpGUy9HRlMgbG9ja2luZyBmaXhlcw0KDQpkaWZmIC0tZ2l0 IGEvc3lzL2NkZGwvY29tcGF0L29wZW5zb2xhcmlzL2tlcm4vb3BlbnNvbGFy aXNfbG9va3VwLmMgYi9zeXMvY2RkbC9jb21wYXQvb3BlbnNvbGFyaXMva2Vy bi9vcGVuc29sYXJpc19sb29rdXAuYw0KaW5kZXggNjU2NTgxMC4uNzRlZDE3 MyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbXBhdC9vcGVuc29sYXJpcy9r ZXJuL29wZW5zb2xhcmlzX2xvb2t1cC5jDQorKysgYi9zeXMvY2RkbC9jb21w YXQvb3BlbnNvbGFyaXMva2Vybi9vcGVuc29sYXJpc19sb29rdXAuYw0KQEAg LTgxLDYgKzgxLDggQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2cHAsIGludCBs a3R5cGUpDQogCSAqIHByb2dyZXNzIG9uIHRoaXMgdm5vZGUuDQogCSAqLw0K IA0KKwl2bl9sb2NrKGN2cCwgbGt0eXBlKTsNCisNCiAJZm9yICg7Oykgew0K IAkJLyoNCiAJCSAqIFJlYWNoZWQgdGhlIGVuZCBvZiB0aGUgbW91bnQgY2hh aW4/DQpAQCAtODksMTMgKzkxLDcgQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2 cHAsIGludCBsa3R5cGUpDQogCQlpZiAodmZzcCA9PSBOVUxMKQ0KIAkJCWJy ZWFrOw0KIAkJZXJyb3IgPSB2ZnNfYnVzeSh2ZnNwLCAwKTsNCi0JCS8qDQot CQkgKiB0dnAgaXMgTlVMTCBmb3IgKmN2cHAgdm5vZGUsIHdoaWNoIHdlIGNh bid0IHVubG9jay4NCi0JCSAqLw0KLQkJaWYgKHR2cCAhPSBOVUxMKQ0KLQkJ CXZwdXQoY3ZwKTsNCi0JCWVsc2UNCi0JCQl2cmVsZShjdnApOw0KKwkJVk9Q X1VOTE9DSyhjdnAsIDApOw0KIAkJaWYgKGVycm9yKQ0KIAkJCXJldHVybiAo ZXJyb3IpOw0KIA0KQEAgLTEwNyw2ICsxMDMsOSBAQCB0cmF2ZXJzZSh2bm9k ZV90ICoqY3ZwcCwgaW50IGxrdHlwZSkNCiAJCXZmc191bmJ1c3kodmZzcCk7 DQogCQlpZiAoZXJyb3IgIT0gMCkNCiAJCQlyZXR1cm4gKGVycm9yKTsNCisN CisJCVZOX1JFTEUoY3ZwKTsNCisNCiAJCWN2cCA9IHR2cDsNCiAJfQ0KIA0K ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRz L2NvbW1vbi9mcy9nZnMuYyBiL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFy aXMvdXRzL2NvbW1vbi9mcy9nZnMuYw0KaW5kZXggNTk5NDRhMS4uMGY4ZjE1 NyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMv dXRzL2NvbW1vbi9mcy9nZnMuYw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9v cGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL2dmcy5jDQpAQCAtNDQ4LDcgKzQ0 OCw2IEBAIGdmc19sb29rdXBfZG90KHZub2RlX3QgKip2cHAsIHZub2RlX3Qg KmR2cCwgdm5vZGVfdCAqcHZwLCBjb25zdCBjaGFyICpubSkNCiAJCQlWTl9I T0xEKHB2cCk7DQogCQkJKnZwcCA9IHB2cDsNCiAJCX0NCi0JCXZuX2xvY2so KnZwcCwgTEtfRVhDTFVTSVZFIHwgTEtfUkVUUlkpOw0KIAkJcmV0dXJuICgw KTsNCiAJfQ0KIA0KQEAgLTQ4NSw2ICs0ODQsNyBAQCBnZnNfZmlsZV9jcmVh dGUoc2l6ZV90IHNpemUsIHZub2RlX3QgKnB2cCwgdmZzX3QgKnZmc3AsIHZu b2Rlb3BzX3QgKm9wcykNCiAJZnAgPSBrbWVtX3phbGxvYyhzaXplLCBLTV9T TEVFUCk7DQogCWVycm9yID0gZ2V0bmV3dm5vZGUoInpmcyIsIHZmc3AsIG9w cywgJnZwKTsNCiAJQVNTRVJUKGVycm9yID09IDApOw0KKwlWTl9MT0NLX0FT SEFSRSh2cCk7DQogCXZuX2xvY2sodnAsIExLX0VYQ0xVU0lWRSB8IExLX1JF VFJZKTsNCiAJdnAtPnZfZGF0YSA9IChjYWRkcl90KWZwOw0KIA0KZGlmZiAt LWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1v bi9mcy96ZnMvemZzX2N0bGRpci5jIGIvc3lzL2NkZGwvY29udHJpYi9vcGVu c29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCmluZGV4 IDI4YWIxZmEuLjY4MDY4NTggMTAwNjQ0DQotLS0gYS9zeXMvY2RkbC9jb250 cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24vZnMvemZzL3pmc19jdGxkaXIu Yw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29t bW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCkBAIC02MTIsNyArNjEyLDcgQEAg emZzY3RsX2ZyZWVic2Rfcm9vdF9sb29rdXAoYXApDQogDQogCWVyciA9IHpm c2N0bF9yb290X2xvb2t1cChkdnAsIG5tLCB2cHAsIE5VTEwsIDAsIE5VTEws IGNyLCBOVUxMLCBOVUxMLCBOVUxMKTsNCiAJaWYgKGVyciA9PSAwICYmIChu bVswXSAhPSAnLicgfHwgbm1bMV0gIT0gJ1wwJykpDQotCQl2bl9sb2NrKCp2 cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZKTsNCisJCXZuX2xvY2soKnZw cCwgYXAtPmFfY25wLT5jbl9sa2ZsYWdzIHwgTEtfUkVUUlkpOw0KIAlyZXR1 cm4gKGVycik7DQogfQ0KIA0KQEAgLTk3NSw2ICs5NzUsOCBAQCB6ZnNjdGxf c25hcGRpcl9sb29rdXAoYXApDQogCVpGU19FTlRFUih6ZnN2ZnMpOw0KIA0K IAlpZiAoZ2ZzX2xvb2t1cF9kb3QodnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxk aXIsIG5tKSA9PSAwKSB7DQorCQlpZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFd ICE9ICdcMCcpDQorCQkJdm5fbG9jaygqdnBwLCBhcC0+YV9jbnAtPmNuX2xr ZmxhZ3MgfCBMS19SRVRSWSk7DQogCQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJ cmV0dXJuICgwKTsNCiAJfQ0KQEAgLTEwMDQsNyArMTAwNiw3IEBAIHpmc2N0 bF9zbmFwZGlyX2xvb2t1cChhcCkNCiAJaWYgKChzZXAgPSBhdmxfZmluZCgm c2RwLT5zZF9zbmFwcywgJnNlYXJjaCwgJndoZXJlKSkgIT0gTlVMTCkgew0K IAkJKnZwcCA9IHNlcC0+c2Vfcm9vdDsNCiAJCVZOX0hPTEQoKnZwcCk7DQot CQllcnIgPSB0cmF2ZXJzZSh2cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZ KTsNCisJCWVyciA9IHRyYXZlcnNlKHZwcCwgYXAtPmFfY25wLT5jbl9sa2Zs YWdzIHwgTEtfUkVUUlkpOw0KIAkJaWYgKGVyciAhPSAwKSB7DQogCQkJVk5f UkVMRSgqdnBwKTsNCiAJCQkqdnBwID0gTlVMTDsNCkBAIC0xMDEzLDYgKzEw MTUsOCBAQCB6ZnNjdGxfc25hcGRpcl9sb29rdXAoYXApDQogCQkJICogVGhl IHNuYXBzaG90IHdhcyB1bm1vdW50ZWQgYmVoaW5kIG91ciBiYWNrcywNCiAJ CQkgKiB0cnkgdG8gcmVtb3VudCBpdC4NCiAJCQkgKi8NCisJCQlWT1BfVU5M T0NLKCp2cHAsIDApOw0KKwkJCVZOX0hPTEQoKnZwcCk7DQogCQkJVkVSSUZZ KHpmc2N0bF9zbmFwc2hvdF96bmFtZShkdnAsIG5tLCBNQVhOQU1FTEVOLCBz bmFwbmFtZSkgPT0gMCk7DQogCQkJZ290byBkb21vdW50Ow0KIAkJfSBlbHNl IHsNCkBAIC0xMDY0LDcgKzEwNjgsNiBAQCB6ZnNjdGxfc25hcGRpcl9sb29r dXAoYXApDQogCXNlcC0+c2VfbmFtZSA9IGttZW1fYWxsb2Moc3RybGVuKG5t KSArIDEsIEtNX1NMRUVQKTsNCiAJKHZvaWQpIHN0cmNweShzZXAtPnNlX25h bWUsIG5tKTsNCiAJKnZwcCA9IHNlcC0+c2Vfcm9vdCA9IHpmc2N0bF9zbmFw c2hvdF9ta25vZGUoZHZwLCBkbXVfb2Jqc2V0X2lkKHNuYXApKTsNCi0JVk5f SE9MRCgqdnBwKTsNCiAJYXZsX2luc2VydCgmc2RwLT5zZF9zbmFwcywgc2Vw LCB3aGVyZSk7DQogDQogCWRtdV9vYmpzZXRfcmVsZShzbmFwLCBGVEFHKTsN CkBAIC0xMDc1LDYgKzEwNzgsNyBAQCBkb21vdW50Og0KIAkodm9pZCkgc25w cmludGYobW91bnRwb2ludCwgbW91bnRwb2ludF9sZW4sDQogCSAgICAiJXMv IiBaRlNfQ1RMRElSX05BTUUgIi9zbmFwc2hvdC8lcyIsDQogCSAgICBkdnAt PnZfdmZzcC0+bW50X3N0YXQuZl9tbnRvbm5hbWUsIG5tKTsNCisJVk5fSE9M RCgqdnBwKTsNCiAJZXJyID0gbW91bnRfc25hcHNob3QoY3VydGhyZWFkLCB2 cHAsICJ6ZnMiLCBtb3VudHBvaW50LCBzbmFwbmFtZSwgMCk7DQogCWttZW1f ZnJlZShtb3VudHBvaW50LCBtb3VudHBvaW50X2xlbik7DQogCWlmIChlcnIg PT0gMCkgew0KQEAgLTExMzAsNiArMTEzNCw4IEBAIHpmc2N0bF9zaGFyZXNf bG9va3VwKGFwKQ0KIAlzdHJsY3B5KG5tLCBjbnAtPmNuX25hbWVwdHIsIGNu cC0+Y25fbmFtZWxlbiArIDEpOw0KIA0KIAlpZiAoZ2ZzX2xvb2t1cF9kb3Qo dnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxkaXIsIG5tKSA9PSAwKSB7DQorCQlp ZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFdICE9ICdcMCcpDQorCQkJdm5fbG9j aygqdnBwLCBhcC0+YV9jbnAtPmNuX2xrZmxhZ3MgfCBMS19SRVRSWSk7DQog CQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJcmV0dXJuICgwKTsNCiAJfQ0KQEAg LTE0NjIsMTggKzE0NjgsMjUgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0aXZl KGFwKQ0KIAl6ZnNjdGxfc25hcGRpcl90ICpzZHA7DQogCXpmc19zbmFwZW50 cnlfdCAqc2VwLCAqbmV4dDsNCiAJaW50IGxvY2tlZDsNCi0Jdm5vZGVfdCAq ZHZwOw0KLQ0KLQlpZiAodnAtPnZfY291bnQgPiAwKQ0KLQkJZ290byBlbmQ7 DQorCWdmc19kaXJfdCAqZHAgPSB2cC0+dl9kYXRhOw0KKwl2bm9kZV90ICpk dnAgPSBkcC0+Z2ZzZF9maWxlLmdmc19wYXJlbnQ7DQogDQotCVZFUklGWShn ZnNfZGlyX2xvb2t1cCh2cCwgIi4uIiwgJmR2cCwgY3IsIDAsIE5VTEwsIE5V TEwpID09IDApOw0KKwlWTl9IT0xEKGR2cCk7DQorCVZPUF9VTkxPQ0sodnAs IDApOw0KIAlzZHAgPSBkdnAtPnZfZGF0YTsNCi0JVk9QX1VOTE9DSyhkdnAs IDApOw0KIA0KIAlpZiAoIShsb2NrZWQgPSBNVVRFWF9IRUxEKCZzZHAtPnNk X2xvY2spKSkNCiAJCW11dGV4X2VudGVyKCZzZHAtPnNkX2xvY2spOw0KIA0K Kwl2bl9sb2NrKHZwLCBMS19FWENMVVNJVkUgfCBMS19SRVRSWSk7DQorDQor CWlmICh2cC0+dl9jb3VudCA+IDApIHsNCisJCWlmICghbG9ja2VkKQ0KKwkJ CW11dGV4X2V4aXQoJnNkcC0+c2RfbG9jayk7DQorCQlWTl9SRUxFKGR2cCk7 DQorCQlyZXR1cm4oMCk7DQorCX0NCisNCiAJQVNTRVJUKCF2bl9pc21udHB0 KHZwKSk7DQogDQogCXNlcCA9IGF2bF9maXJzdCgmc2RwLT5zZF9zbmFwcyk7 DQpAQCAtMTQ5NCw3ICsxNTA3LDYgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0 aXZlKGFwKQ0KIAkJbXV0ZXhfZXhpdCgmc2RwLT5zZF9sb2NrKTsNCiAJVk5f UkVMRShkdnApOw0KIA0KLWVuZDoNCiAJLyoNCiAJICogRGlzcG9zZSBvZiB0 aGUgdm5vZGUgZm9yIHRoZSBzbmFwc2hvdCBtb3VudCBwb2ludC4NCiAJICog VGhpcyBpcyBzYWZlIHRvIGRvIGJlY2F1c2Ugb25jZSB0aGlzIGVudHJ5IGhh cyBiZWVuIHJlbW92ZWQNCkBAIC0xNTg4LDcgKzE2MDAsNyBAQCB6ZnNjdGxf c25hcHNob3RfbG9va3VwKGFwKQ0KIAllcnJvciA9IHpmc2N0bF9yb290X2xv b2t1cCh6ZnN2ZnMtPnpfY3RsZGlyLCAic25hcHNob3QiLCB2cHAsDQogCSAg ICBOVUxMLCAwLCBOVUxMLCBjciwgTlVMTCwgTlVMTCwgTlVMTCk7DQogCWlm IChlcnJvciA9PSAwKQ0KLQkJdm5fbG9jaygqdnBwLCBMS19FWENMVVNJVkUg fCBMS19SRVRSWSk7DQorCQl2bl9sb2NrKCp2cHAsIGFwLT5hX2NucC0+Y25f bGtmbGFncyB8IExLX1JFVFJZKTsNCiAJcmV0dXJuIChlcnJvcik7DQogfQ0K IA0K --1030603365-641758273-1387948957=:2785--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.DEB.2.10.1312250614001.2785>