From owner-freebsd-geom@FreeBSD.ORG  Tue Nov 24 17:53:31 2009
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B423E106566C
	for <freebsd-geom@freebsd.org>; Tue, 24 Nov 2009 17:53:31 +0000 (UTC)
	(envelope-from korvus@comcast.net)
Received: from QMTA13.westchester.pa.mail.comcast.net
	(qmta13.westchester.pa.mail.comcast.net [76.96.59.243])
	by mx1.freebsd.org (Postfix) with ESMTP id 637648FC12
	for <freebsd-geom@freebsd.org>; Tue, 24 Nov 2009 17:53:31 +0000 (UTC)
Received: from OMTA14.westchester.pa.mail.comcast.net ([76.96.62.60])
	by QMTA13.westchester.pa.mail.comcast.net with comcast
	id 91851d01T1HzFnQ5D5gGTE; Tue, 24 Nov 2009 17:40:16 +0000
Received: from [192.168.2.164] ([206.210.89.202])
	by OMTA14.westchester.pa.mail.comcast.net with comcast
	id 95g31d0034Mx3R23a5g5lE; Tue, 24 Nov 2009 17:40:14 +0000
Message-ID: <4B0C1A72.3000301@comcast.net>
Date: Tue, 24 Nov 2009 12:40:02 -0500
From: Steve Polyack <korvus@comcast.net>
User-Agent: Thunderbird 2.0.0.23 (X11/20090902)
MIME-Version: 1.0
To: freebsd-hardware@freebsd.org, freebsd-stable <freebsd-stable@FreeBSD.org>,
	freebsd-geom@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Panic possibly related to glabel/geom and siis(4)
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Nov 2009 17:53:31 -0000

I have a system running 8.0-PRERELEASE with multiple drives and SATA 
port multipliers (siis controllers and PMPs).  All of the attached 
drives are labeled via glabel(8) and then included into a ZFS pool.  
During some testing to determine how the system would react to a dead 
drive (simulated by physically removing a drive during operation),  I 
was able to produce a panic.

Now, I know that the SATA PMP and siis(4) code to handle and recover 
from device errors is incomplete, but I believe the crash may be 
particular to using glabel'd drives.  Basically, after removing a drive 
while the zpool is in use and issues 'camcontrol reset' and 'rescan' on 
the appropriate bus, the physical device associated with the drive 
disappears.  In this case:
  (pass5:siisch7:0:15:0): lost device
  (pass5:siisch7:0:15:0): removing device entry
  (ada2:siisch7:0:0:0): lost device

and /dev/ada2 disappears.  However, the associated glabel 
/dev/label/bigdisk07 remains.  Since my ZFS pool is created based on the 
drive glabels, I believe this is why ZFS never notices the drives 
disappear either.

Do glabels typically go away after a physical device is lost?  Should 
this not be the case?


After some runtime with the physical device missing, a kernel panic is 
produced:

ada2:siisch7:0:0:0): Synchronize cache failed
(ada2:siisch7:0:0:0): removing device entry


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 14
fault virtual address   = 0x48
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8035f375
stack pointer           = 0x28:0xffffff800006db60
frame pointer           = 0x28:0xffffff800006db70
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2 (g_event)
[thread pid 2 tid 100014 ]
Stopped at      _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x18(%rdi)
db> bt
Tracing pid 2 tid 100014 td 0xffffff00014d4ab0
_mtx_lock_flags() at _mtx_lock_flags+0x15
vdev_geom_release() at vdev_geom_release+0x33
vdev_geom_orphan() at vdev_geom_orphan+0x15c
g_run_events() at g_run_events+0x104
g_event_procbody() at g_event_procbody+0x55
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800006dd30, rbp = 0 ---


I'm open to try patches and other suggestions.  Thanks.