From owner-freebsd-geom@FreeBSD.ORG  Wed Aug  3 14:43:27 2011
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9DFBE106564A
	for <freebsd-geom@freebsd.org>; Wed,  3 Aug 2011 14:43:27 +0000 (UTC)
	(envelope-from mj@feral.com)
Received: from ns1.feral.com (ns1.feral.com [192.67.166.1])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A83D8FC15
	for <freebsd-geom@freebsd.org>; Wed,  3 Aug 2011 14:43:27 +0000 (UTC)
Received: from [192.168.135.105] (c-24-7-47-62.hsd1.ca.comcast.net
	[24.7.47.62]) (authenticated bits=0)
	by ns1.feral.com (8.14.4/8.14.4) with ESMTP id p73EEYY0010431
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-geom@freebsd.org>; Wed, 3 Aug 2011 07:14:35 -0700 (PDT)
	(envelope-from mj@feral.com)
Message-ID: <4E3957C6.1000406@feral.com>
Date: Wed, 03 Aug 2011 07:14:30 -0700
From: Matthew Jacob <mj@feral.com>
Organization: Feral Software
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-geom@freebsd.org
References: <4E394269.3090208@darkbsd.org>
In-Reply-To: <4E394269.3090208@darkbsd.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(ns1.feral.com [192.67.166.1]);
	Wed, 03 Aug 2011 07:14:35 -0700 (PDT)
Subject: Re: Poor interaction between gmultipath(8), ZFS and isp(4)
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2011 14:43:27 -0000

Known problem. Or rather, one of a long set of known problems.

Most of these were addressed at Panasas under RELENG_7, but I have not 
had the time to redo them for RELENG_8 and later. Nor was I really happy 
with a lot of the results. At least from my perspective, due to work 
commitments, I'm unlikely to get to this very soon. Regrets.

On 8/3/2011 5:43 AM, Stephane LAPIE wrote:
> Hello list,
>
> (Not 100% sure the bug is in GEOM_MULTIPATH or in another driver.)
>
> I am running a FreeBSD 8.2-RELEASE server with ZFSv15, with the
> following hardware :
>
> http://www.darkbsd.org/~darksoul/server_dmesg.txt
>
> I have a dual fibre-channel controller (isp(4) driver), and I am
> accessing 16 RAID0 logical drives on a Promise vTrak E630fD (1 volume /
> physical disk)
>
> Since both controllers are plugged to the same storage unit with no LUN
> masking, both controllers end up seeing the same devices. Which is what
> made me combine these devices using geom_multipath.
>
> Here is my zpool structure :
> config:
>
>          NAME                  STATE     READ WRITE CKSUM
>          data                  ONLINE       0     0     0
>            raidz1              ONLINE       0     0     0
>              multipath/disk0   ONLINE       0     0     0
>              multipath/disk1   ONLINE       0     0     0
>              multipath/disk2   ONLINE       0     0     0
>              multipath/disk3   ONLINE       0     0     0
>              multipath/disk4   ONLINE       0     0     0
>              multipath/disk5   ONLINE       0     0     0
>              multipath/disk6   ONLINE       0     0     0
>              multipath/disk7   ONLINE       0     0     0
>            raidz1              ONLINE       0     0     0
>              multipath/disk8   ONLINE       0     0     0
>              multipath/disk9   ONLINE       0     0     0
>              multipath/disk10  ONLINE       0     0     0
>              multipath/disk11  ONLINE       0     0     0
>              multipath/disk12  ONLINE       0     0     0
>              multipath/disk13  ONLINE       0     0     0
>              multipath/disk14  ONLINE       0     0     0
>              multipath/disk15  ONLINE       0     0     0
>
> errors: No known data errors
>
>
> Using gmultipath, I eventually want to have disk{1,3,5,7,9,11,13,15} use
> the second controller, while the rest uses the first. The idea was that
> if anyone removed the fiber, it would switch everything over to the
> remaining fiber.
>
> For the sake of testing, I put every multipath device on the same
> controller, isp1.
>
> Here is the kernel log fragment I could acquire from my test (removing a
> fiber on which transfers are actively running), however since I don't
> have serial console access, I couldn't acquire the relevant kernel panic
> trace (it simply mentions a kernel trap during a page fault in g_mp_kt
> in the last readable section displayed, but I reckon it's like every CPU
> raises the panic message)
>
> http://www.darkbsd.org/~darksoul/server_lastlog_before_kernelpanic.txt
>
> After that, I get the aforementioned kernel panic. I can consistently
> reproduce it, and will try to acquire serial console output to get more
> detailed kernel panic trace, but it feels like everything is occuring at
> the same time without proper locking, or confirming relevant structures
> are still allocated. This looks like a race condition between isp(4)
> loopdown provoking da(4) destruction, and gmultipath(8) failover.
> (Therefore having g_mp_kt accessing a da(4) structure that is being
> destroyed, or already destroyed, and accessing unallocated memory)
>
> Maybe this is similar to this issue :
> http://freebsd.1045724.n5.nabble.com/Kernel-panic-with-gmultipath-td4204700.html
>
>
> Could this be tuned so that :
> 1) initially, on isp(4) loopdown ->  da(4) devices depending on it return
> SCSI errors, provoking clean failover of gmultipath
> 2) afterwards, on isp(4) timeout ->  da(4) devices are destroyed
>
> Is this a case for using the following boot hints ?
> - "hint.isp.0.loop_down_limit" and "hint.isp.0.gone_device_time" (though
> I am not quite sure what the difference is between the two ... Which one
> does the actual deallocation of underlying devices ?)
>
> Thanks in advance for your time,