Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Feb 2010 12:01:13 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, FreeBSD-Current <freebsd-current@freebsd.org>, kib@FreeBSD.org, freebsd-geom@freebsd.org
Subject:   Re: Deadlock between GEOM and devfs device destroy and process exit.
Message-ID:  <4B66A669.2070406@FreeBSD.org>
In-Reply-To: <20100201092334.GB1743@garage.freebsd.pl>
References:  <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> <20100201092334.GB1743@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------020500080902030103010007
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 7bit

Pawel Jakub Dawidek wrote:
> On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote:
>> Maybe I'll add how I understand what's going on:
>>
>> GEOM calls destroy_dev() while holding the topology lock.
>>
>> Destroy_dev() wants to destroy device, but can't because there are
>> threads that still have it open.
>>
>> The threads can't close it, because to close it they need the topology
>> lock.
>>
>> The deadlock is quite obvious, IMHO.
> 
> Guys, changing destroy_dev() to destroy_dev_sched() in geom_dev.c fixes
> the problem for me (at least it makes race window so small that I can't
> reproduce it). Is there anyone who isn't happy with such a change?

Have you done some locking there? Because my system crashes with such
straightforward change, when g_dev_close() called after geom node being
destroyed. Attached patch fixes that for me, but I have doubts that it
is complete. There is still seems to be a race with new I/O requests and
ioctl's, that are not protected by topology lock. At least if devfs code
doesn't handle it somehow.

-- 
Alexander Motin

--------------020500080902030103010007
Content-Type: text/plain;
 name="geom_dev.destroy.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="geom_dev.destroy.patch"

--- geom_dev.c.prev	2010-01-30 05:22:36.000000000 +0200
+++ geom_dev.c	2010-02-01 11:55:53.000000000 +0200
@@ -184,10 +184,13 @@ g_dev_close(struct cdev *dev, int flags,
 	struct g_consumer *cp;
 	int error, r, w, e, i;
 
+	g_topology_lock();
 	gp = dev->si_drv1;
 	cp = dev->si_drv2;
-	if (gp == NULL || cp == NULL)
+	if (gp == NULL || cp == NULL) {
+		g_topology_unlock();
 		return(ENXIO);
+	}
 	g_trace(G_T_ACCESS, "g_dev_close(%s, %d, %d, %p)",
 	    gp->name, flags, fmt, td);
 	r = flags & FREAD ? -1 : 0;
@@ -197,7 +200,6 @@ g_dev_close(struct cdev *dev, int flags,
 #else
 	e = 0;
 #endif
-	g_topology_lock();
 	if (dev->si_devsw == NULL)
 		error = ENXIO;		/* We were orphaned */
 	else
@@ -435,7 +437,10 @@ g_dev_orphan(struct g_consumer *cp)
 		set_dumper(NULL);
 
 	/* Destroy the struct cdev *so we get no more requests */
-	destroy_dev(dev);
+	gp->softc = NULL;
+	dev->si_drv1 = NULL;
+	dev->si_drv2 = NULL;
+	destroy_dev_sched(dev);
 
 	/* Wait for the cows to come home */
 	while (cp->nstart != cp->nend)

--------------020500080902030103010007--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B66A669.2070406>