From owner-freebsd-current@FreeBSD.ORG Wed Jan 23 10:08:04 2013 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D6D124F9; Wed, 23 Jan 2013 10:08:04 +0000 (UTC) (envelope-from okuno.kohji@jp.panasonic.com) Received: from smtp.mei.co.jp (smtp.mei.co.jp [133.183.100.20]) by mx1.freebsd.org (Postfix) with ESMTP id 408D177F; Wed, 23 Jan 2013 10:08:03 +0000 (UTC) Received: from mail-gw.jp.panasonic.com ([157.8.1.157]) by smtp.mei.co.jp (8.12.11.20060614/3.7W/kc-maile13) with ESMTP id r0NA7v3k004048; Wed, 23 Jan 2013 19:07:57 +0900 (JST) Received: from epochmail.jp.panasonic.com ([157.8.1.130]) by mail.jp.panasonic.com (8.11.6p2/3.7W/kc-maili12) with ESMTP id r0NA7uG12602; Wed, 23 Jan 2013 19:07:56 +0900 Received: by epochmail.jp.panasonic.com (8.12.11.20060308/3.7W/lomi13) id r0NA7tMH010780; Wed, 23 Jan 2013 19:07:55 +0900 Received: from localhost by lomi13.jp.panasonic.com (8.12.11.20060308/3.7W) with ESMTP id r0NA7t0r010757; Wed, 23 Jan 2013 19:07:55 +0900 Date: Wed, 23 Jan 2013 19:07:54 +0900 (JST) Message-Id: <20130123.190754.558674226904055738.okuno.kohji@jp.panasonic.com> To: kostikbel@gmail.com Subject: Re: deadlock between g_event and a thread on removing a device. From: Kohji Okuno In-Reply-To: <20130123095414.GU2522@kib.kiev.ua> References: <20130118.144538.1860198911942517633.okuno.kohji@jp.panasonic.com> <20130123095414.GU2522@kib.kiev.ua> Organization: Panasonic Corporation X-Mailer: Mew version 6.5 on Emacs 23.4 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: ae@freebsd.org, freebsd-current@FreeBSD.org, okuno.kohji@jp.panasonic.com X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jan 2013 10:08:04 -0000 Hi Konstantin, Thank you for your comment. I don't have any solution for this issue. And when a device is removed suddenly, there are other problems, I think. > On Fri, Jan 18, 2013 at 02:45:38PM +0900, Kohji Okuno wrote: >> Hi, >> >> When I removed a device (ex. /dev/da0), I have encounterd a >> dead-lock between ``g_event'' thread and a thread that is opening >> device file (I call this thread as A). >> >> Would you refer the following? >> >> When the device is removed between dev_refthread() and g_dev_open(), >> thread A incremented dev->si_threadcount, but can't acquire >> topology_lock. >> >> On the other hand, g_event is waiting to set dev->si_threadcount to 0 >> with topology_lock. >> >> Regards, >> Kohji Okuno >> >> >> <<< Thread A >>> >> ... >> devfs_open() >> { >> ... >> dsw = dev_refthread(dev, &ref); <= increment dev->si_threadcount >> ... >> error = dsw->d_open(...); <= call g_dev_open() >> ... >> dev_relthread(dev, ref); <= decrement dev->si_threadcount >> } >> >> g_dev_open() >> { >> ... >> g_topology_lock(); <= Thread A couldn't acquire >> ... topology_lock. >> } >> >> <<< g_event >>> >> g_run_events() >> { >> ... >> g_topology_lock(); <= g_event acuired topology_lock here. >> ... >> one_event() >> ... >> } >> >> one_event() >> g_orphan_register() >> g_dev_orphan() >> destroy_dev() >> destroy_dev() >> destroy_devl() >> { >> ... >> while (dev->si_threadcount != 0) { <= this count was incremented by Thread A >> /* Use unique dummy wait ident */ >> msleep(&csw, &devmtx, PRIBIO, "devdrn", hz / 10); >> } >> ... >> } > > Yes, you are absolutely right. > > I believe there were some patches floating around which changed the > destroy_dev() call in the g_dev_orphan() to destroy_dev_sched(). I do > not remember who was the author. > > My reply was that naive substitution of the destroy_dev() to > destroy_dev_sched() is racy, because some requests might still come > in after the call to destroy_dev_sched(). Despite destroy_dev_sched() > setting the CDP_SCHED_DTR flag on the devfs node, some thread might > already entered the cdevsw method. I do not believe that there was > further progress there.