From owner-freebsd-geom@FreeBSD.ORG Mon Feb 1 10:11:23 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8AB7F106568F; Mon, 1 Feb 2010 10:11:22 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com [209.85.220.227]) by mx1.freebsd.org (Postfix) with ESMTP id 105978FC26; Mon, 1 Feb 2010 10:11:20 +0000 (UTC) Received: by fxm27 with SMTP id 27so6636fxm.3 for ; Mon, 01 Feb 2010 02:11:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type; bh=4C+2dtjr7H34TXYZfwwNTmVG0F2U2MYsuZpisFty3SM=; b=dl+Bqdy8svR7JQS+qjOjJsWyFEijYaG8Ciwvt0LC8BruY2S52ofNQe3gQm2058Cir0 SiZ8Emii+MJiBEktQyPHlViGZHS76oBI57X5GX7vsrh9DDAbBuCDeDusN7HcqgMOoN+V dCOA8xZLMP5Dsg5atplQ1LYGNgxfJHI09Otx0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type; b=gCmtiqrEIlMBDg1i/PP5Z/EThk7GTRI1ZQms33WQSrRhpoAQqZMh4gDRra9gLkOs7E NplRVh+dIbRU9jlF+talHEFnd3thurh14Fn6//5/Btqi4x9kgt/Pu21qmCSgU4ev9nza gm5ypXwZAhg5H8xCLOOgii9IYZl46eZ6hJVN8= Received: by 10.102.182.6 with SMTP id e6mr2415630muf.63.1265018476992; Mon, 01 Feb 2010 02:01:16 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id s10sm14582853muh.29.2010.02.01.02.01.15 (version=SSLv3 cipher=RC4-MD5); Mon, 01 Feb 2010 02:01:15 -0800 (PST) Sender: Alexander Motin Message-ID: <4B66A669.2070406@FreeBSD.org> Date: Mon, 01 Feb 2010 12:01:13 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> <20100201092334.GB1743@garage.freebsd.pl> In-Reply-To: <20100201092334.GB1743@garage.freebsd.pl> X-Enigmail-Version: 0.96.0 Content-Type: multipart/mixed; boundary="------------020500080902030103010007" Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , kib@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Feb 2010 10:11:23 -0000 This is a multi-part message in MIME format. --------------020500080902030103010007 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote: >> Maybe I'll add how I understand what's going on: >> >> GEOM calls destroy_dev() while holding the topology lock. >> >> Destroy_dev() wants to destroy device, but can't because there are >> threads that still have it open. >> >> The threads can't close it, because to close it they need the topology >> lock. >> >> The deadlock is quite obvious, IMHO. > > Guys, changing destroy_dev() to destroy_dev_sched() in geom_dev.c fixes > the problem for me (at least it makes race window so small that I can't > reproduce it). Is there anyone who isn't happy with such a change? Have you done some locking there? Because my system crashes with such straightforward change, when g_dev_close() called after geom node being destroyed. Attached patch fixes that for me, but I have doubts that it is complete. There is still seems to be a race with new I/O requests and ioctl's, that are not protected by topology lock. At least if devfs code doesn't handle it somehow. -- Alexander Motin --------------020500080902030103010007 Content-Type: text/plain; name="geom_dev.destroy.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="geom_dev.destroy.patch" --- geom_dev.c.prev 2010-01-30 05:22:36.000000000 +0200 +++ geom_dev.c 2010-02-01 11:55:53.000000000 +0200 @@ -184,10 +184,13 @@ g_dev_close(struct cdev *dev, int flags, struct g_consumer *cp; int error, r, w, e, i; + g_topology_lock(); gp = dev->si_drv1; cp = dev->si_drv2; - if (gp == NULL || cp == NULL) + if (gp == NULL || cp == NULL) { + g_topology_unlock(); return(ENXIO); + } g_trace(G_T_ACCESS, "g_dev_close(%s, %d, %d, %p)", gp->name, flags, fmt, td); r = flags & FREAD ? -1 : 0; @@ -197,7 +200,6 @@ g_dev_close(struct cdev *dev, int flags, #else e = 0; #endif - g_topology_lock(); if (dev->si_devsw == NULL) error = ENXIO; /* We were orphaned */ else @@ -435,7 +437,10 @@ g_dev_orphan(struct g_consumer *cp) set_dumper(NULL); /* Destroy the struct cdev *so we get no more requests */ - destroy_dev(dev); + gp->softc = NULL; + dev->si_drv1 = NULL; + dev->si_drv2 = NULL; + destroy_dev_sched(dev); /* Wait for the cows to come home */ while (cp->nstart != cp->nend) --------------020500080902030103010007--