Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Feb 2004 20:12:35 +0100
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To:        Lukas Ertl <le@freebsd.org>
Cc:        freebsd-geom@freebsd.org
Subject:   Re: vinum and GEOM deadlock situation 
Message-ID:  <13607.1075835555@critter.freebsd.dk>
In-Reply-To: Your message of "Tue, 03 Feb 2004 19:10:23 %2B0100." <20040203190839.Y616@korben.in.tern> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <20040203190839.Y616@korben.in.tern>, Lukas Ertl writes:
>On Tue, 3 Feb 2004, Pawel Jakub Dawidek wrote:
>
>> On Tue, Feb 03, 2004 at 04:56:23PM +0100, Lukas Ertl wrote:
>> +> I'm running into a deadlock situation with the following scenario:
>> +>
>> +> Have a vinum RAID5 with several disks mounted, pull out one of the disks,
>> +> shortly thereafter all I/O hangs.
>> +>
>> +> I managed to identify the deadlock, but couldn't come up with a fix yet.
>> +>
>> +> Let's see.  Here's the backtrace of the vinum process:
>> [...]
>>
>> Yes, the deadlock is obvious.
>> [...]
>> The problem here is, that dp->d_close() is called with the topology lock
>> and d_close() is calling disk_destroy() and there topology lock should
>> not be holded.
>
>I also think that the only place where we can drop and re-grab the
>topology lock is around the dp->d_close() call, but I'm not sure if there
>are any side effects.

This is the kind of trouble I feared we would see if vinum was put
in on the disk_*() API.   The trouble is not only the g_topology()
lock, but also Giant.  And to make matter worse, the WITNESS order
of those two are the "Giant is going away" rather than the more
widespread "Giant is everywhere" order.

I have no good suggestions for fixing it, most of the places I have
had to deal with this (notably in the disk_* API) I have used the
geom_event mechanism, but in this case you probably need an event
mechanism which is "on the other side" where it does not hold the
topology lock.  Consider a task-queue.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13607.1075835555>