From owner-freebsd-geom@FreeBSD.ORG Tue Feb 3 11:12:40 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2405B16A4CE; Tue, 3 Feb 2004 11:12:40 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id AD12243D48; Tue, 3 Feb 2004 11:12:37 -0800 (PST) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.10/8.12.10) with ESMTP id i13JCZDF013608; Tue, 3 Feb 2004 20:12:35 +0100 (CET) (envelope-from phk@phk.freebsd.dk) To: Lukas Ertl From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 03 Feb 2004 19:10:23 +0100." <20040203190839.Y616@korben.in.tern> Date: Tue, 03 Feb 2004 20:12:35 +0100 Message-ID: <13607.1075835555@critter.freebsd.dk> cc: Pawel Jakub Dawidek cc: freebsd-geom@freebsd.org Subject: Re: vinum and GEOM deadlock situation X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2004 19:12:40 -0000 In message <20040203190839.Y616@korben.in.tern>, Lukas Ertl writes: >On Tue, 3 Feb 2004, Pawel Jakub Dawidek wrote: > >> On Tue, Feb 03, 2004 at 04:56:23PM +0100, Lukas Ertl wrote: >> +> I'm running into a deadlock situation with the following scenario: >> +> >> +> Have a vinum RAID5 with several disks mounted, pull out one of the disks, >> +> shortly thereafter all I/O hangs. >> +> >> +> I managed to identify the deadlock, but couldn't come up with a fix yet. >> +> >> +> Let's see. Here's the backtrace of the vinum process: >> [...] >> >> Yes, the deadlock is obvious. >> [...] >> The problem here is, that dp->d_close() is called with the topology lock >> and d_close() is calling disk_destroy() and there topology lock should >> not be holded. > >I also think that the only place where we can drop and re-grab the >topology lock is around the dp->d_close() call, but I'm not sure if there >are any side effects. This is the kind of trouble I feared we would see if vinum was put in on the disk_*() API. The trouble is not only the g_topology() lock, but also Giant. And to make matter worse, the WITNESS order of those two are the "Giant is going away" rather than the more widespread "Giant is everywhere" order. I have no good suggestions for fixing it, most of the places I have had to deal with this (notably in the disk_* API) I have used the geom_event mechanism, but in this case you probably need an event mechanism which is "on the other side" where it does not hold the topology lock. Consider a task-queue. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.