Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jan 2013 19:07:54 +0900 (JST)
From:      Kohji Okuno <okuno.kohji@jp.panasonic.com>
To:        kostikbel@gmail.com
Cc:        ae@freebsd.org, freebsd-current@FreeBSD.org, okuno.kohji@jp.panasonic.com
Subject:   Re: deadlock between g_event and a thread on removing a device.
Message-ID:  <20130123.190754.558674226904055738.okuno.kohji@jp.panasonic.com>
In-Reply-To: <20130123095414.GU2522@kib.kiev.ua>
References:  <20130118.144538.1860198911942517633.okuno.kohji@jp.panasonic.com> <20130123095414.GU2522@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Konstantin,

Thank you for your comment.

I don't have any solution for this issue.
And when a device is removed suddenly, there are other problems, I think.


> On Fri, Jan 18, 2013 at 02:45:38PM +0900, Kohji Okuno wrote:
>> Hi,
>> 
>> When I removed a device (ex. /dev/da0), I have encounterd a
>> dead-lock between ``g_event'' thread and a thread that is opening
>> device file (I call this thread as A).
>> 
>> Would you refer the following?
>> 
>> When the device is removed between dev_refthread() and g_dev_open(),
>> thread A incremented dev->si_threadcount, but can't acquire
>> topology_lock.
>> 
>> On the other hand, g_event is waiting to set dev->si_threadcount to 0
>> with topology_lock.
>> 
>> Regards,
>>  Kohji Okuno
>> 
>> 
>> <<< Thread A >>>
>> ...
>> devfs_open()
>> {
>>   ...
>>   dsw = dev_refthread(dev, &ref); <= increment dev->si_threadcount
>>   ...
>>   error = dsw->d_open(...);       <= call g_dev_open()
>>   ...
>>   dev_relthread(dev, ref);        <= decrement dev->si_threadcount
>> }
>> 
>> g_dev_open()
>> {
>>   ...
>>   g_topology_lock();              <= Thread A couldn't acquire 
>>   ...                                topology_lock.
>> }
>> 
>> <<< g_event >>>
>> g_run_events()
>> {
>>    ...
>>    g_topology_lock();             <= g_event acuired topology_lock here.
>>    ...
>>    one_event()
>>    ...
>> }
>> 
>> one_event()
>> g_orphan_register()
>> g_dev_orphan()
>> destroy_dev()
>> destroy_dev()
>> destroy_devl()
>> {
>>   ...
>>   while (dev->si_threadcount != 0) { <= this count was incremented by Thread A
>>     /* Use unique dummy wait ident */
>>     msleep(&csw, &devmtx, PRIBIO, "devdrn", hz / 10);
>>   }
>>   ...
>> }
> 
> Yes, you are absolutely right.
> 
> I believe there were some patches floating around which changed the
> destroy_dev() call in the g_dev_orphan() to destroy_dev_sched(). I do
> not remember who was the author.
> 
> My reply was that naive substitution of the destroy_dev() to
> destroy_dev_sched() is racy, because some requests might still come
> in after the call to destroy_dev_sched(). Despite destroy_dev_sched()
> setting the CDP_SCHED_DTR flag on the devfs node, some thread might
> already entered the cdevsw method. I do not believe that there was
> further progress there.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130123.190754.558674226904055738.okuno.kohji>