From owner-freebsd-bugs@FreeBSD.ORG Thu Sep 23 18:30:27 2004 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A3FAA16A4CF for ; Thu, 23 Sep 2004 18:30:27 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8759943D46 for ; Thu, 23 Sep 2004 18:30:27 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i8NIURLI039934 for ; Thu, 23 Sep 2004 18:30:27 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i8NIURKU039931; Thu, 23 Sep 2004 18:30:27 GMT (envelope-from gnats) Resent-Date: Thu, 23 Sep 2004 18:30:27 GMT Resent-Message-Id: <200409231830.i8NIURKU039931@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Brian Eng Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C509816A4CE for ; Thu, 23 Sep 2004 18:27:05 +0000 (GMT) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id B6B6843D31 for ; Thu, 23 Sep 2004 18:27:05 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i8NIR3nm071355 for ; Thu, 23 Sep 2004 18:27:03 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.12.11/8.12.11/Submit) id i8NIR3TK071354; Thu, 23 Sep 2004 18:27:03 GMT (envelope-from nobody) Message-Id: <200409231827.i8NIR3TK071354@www.freebsd.org> Date: Thu, 23 Sep 2004 18:27:03 GMT From: Brian Eng To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Subject: kern/72041: Deadlock when disk is destroyed while user process closes X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Sep 2004 18:30:27 -0000 >Number: 72041 >Category: kern >Synopsis: Deadlock when disk is destroyed while user process closes >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Sep 23 18:30:27 GMT 2004 >Closed-Date: >Last-Modified: >Originator: Brian Eng >Release: 5.2.1-RELEASE >Organization: MidStream >Environment: FreeBSD lexington.midstream.com 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #9: Thu Sep 2 14:23:04 PDT 2004 brian@lexington.midstream.com:/usr/src/sys/i386/compile/BRIAN i386 >Description: The deadlock is between the geom code and the cam code. It occurred when a fibre channel cable was removed when a user process was still accessing a disk through it. The system is set up to do a 'camcontrol rescan' upon indication from the HBA driver that the storage devices in the system may have changed. 'camcontrol rescan' triggers a succession of SCSI commands that are driven by the cambio/camisr() software interrupt. When the cable was unplugged, this led to cambio calling disk_destroy() on the disks that were now lost. disk_destroy() led to an attempt to acquire topology_lock() in the g_event thread. Meanwhile, the user app (dd) received an I/O error and closed the device. This led to a call to g_dev_close(), which acquired topology_lock() and then went down to daclose(), which sent a SCSI SYNC_CACHE command and waited for the command to complete. The SYNC_CACHE command completes, but the syscall is never told by cambio, which is frozen waiting for the lock that the syscall is holding. >How-To-Repeat: Do 'camcontrol rescan' either continuously or upon driver notification of changes. Set up a bunch of processes (I was using 'dd') to read a removable disk, then remove it while the processes are running. There may also be a scenario with disk_create. >Fix: One perspective on this is that cambio inverted the layers; normally, geom code calls cam code, but in the 'camcontrol rescan' case, cam code calls geom code, resulting in locks being taken in opposite order. Perhaps disk_destroy could just queue to g_event and not wait for completion. >Release-Note: >Audit-Trail: >Unformatted: