From owner-freebsd-bugs@FreeBSD.ORG  Thu Sep 23 18:30:27 2004
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@hub.freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A3FAA16A4CF
	for <freebsd-bugs@hub.freebsd.org>;
	Thu, 23 Sep 2004 18:30:27 +0000 (GMT)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8759943D46
	for <freebsd-bugs@hub.freebsd.org>;
	Thu, 23 Sep 2004 18:30:27 +0000 (GMT)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i8NIURLI039934
	for <freebsd-bugs@freefall.freebsd.org>; Thu, 23 Sep 2004 18:30:27 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i8NIURKU039931;
	Thu, 23 Sep 2004 18:30:27 GMT
	(envelope-from gnats)
Resent-Date: Thu, 23 Sep 2004 18:30:27 GMT
Resent-Message-Id: <200409231830.i8NIURKU039931@freefall.freebsd.org>
Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer)
Resent-To: freebsd-bugs@FreeBSD.org
Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org,
	Brian Eng <brian@midstream.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C509816A4CE
	for <freebsd-gnats-submit@FreeBSD.org>;
	Thu, 23 Sep 2004 18:27:05 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B6B6843D31
	for <freebsd-gnats-submit@FreeBSD.org>;
	Thu, 23 Sep 2004 18:27:05 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i8NIR3nm071355
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 23 Sep 2004 18:27:03 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id i8NIR3TK071354;
	Thu, 23 Sep 2004 18:27:03 GMT
	(envelope-from nobody)
Message-Id: <200409231827.i8NIR3TK071354@www.freebsd.org>
Date: Thu, 23 Sep 2004 18:27:03 GMT
From: Brian Eng <brian@midstream.com>
To: freebsd-gnats-submit@FreeBSD.org
X-Send-Pr-Version: www-2.3
Subject: kern/72041: Deadlock when disk is destroyed while user process
	closes
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Sep 2004 18:30:27 -0000


>Number:         72041
>Category:       kern
>Synopsis:       Deadlock when disk is destroyed while user process closes
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 23 18:30:27 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Brian Eng
>Release:        5.2.1-RELEASE
>Organization:
MidStream
>Environment:
FreeBSD lexington.midstream.com 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #9: Thu Sep  2 14:23:04 PDT 2004     brian@lexington.midstream.com:/usr/src/sys/i386/compile/BRIAN  i386

>Description:
The deadlock is between the geom code and the cam code.  It occurred when a fibre channel cable was removed when a user process was still accessing a disk through it.  

The system is set up to do a 'camcontrol rescan' upon indication from the HBA driver that the storage devices in the system may have changed.  'camcontrol rescan' triggers a succession of SCSI commands that are driven by the cambio/camisr() software interrupt.  When the cable was unplugged, this led to cambio calling disk_destroy() on the disks that were now lost.  disk_destroy() led to an attempt to acquire topology_lock() in the g_event thread.

Meanwhile, the user app (dd) received an I/O error and closed the device.  This led to a call to g_dev_close(), which acquired topology_lock() and then went down to daclose(), which sent a SCSI SYNC_CACHE command and waited for the command to complete.

The SYNC_CACHE command completes, but the syscall is never told by cambio, which is frozen waiting for the lock that the syscall is holding.
>How-To-Repeat:
Do 'camcontrol rescan' either continuously or upon driver notification of changes.  Set up a bunch of processes (I was using 'dd') to read a removable disk, then remove it while the processes are running.

There may also be a scenario with disk_create.
>Fix:
One perspective on this is that cambio inverted the layers; normally, geom code calls cam code, but in the 'camcontrol rescan' case, cam code calls geom code, resulting in locks being taken in opposite order.  Perhaps disk_destroy could just queue to g_event and not wait for completion.

>Release-Note:
>Audit-Trail:
>Unformatted: