From owner-freebsd-stable@FreeBSD.ORG  Thu May  7 12:35:21 2015
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 893E990
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 12:35:21 +0000 (UTC)
Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 24A841F3B
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 12:35:20 +0000 (UTC)
Received: by wgyo15 with SMTP id o15so42045717wgy.2
 for <freebsd-stable@freebsd.org>; Thu, 07 May 2015 05:35:13 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=PracMYD1kIAZrzphs+rQZM5gh8XL/8mh/x8EDIY+9pc=;
 b=MPrznV6OpfdqV/2iob3Q5hlOX/Xv0L5ZQdrL7ALFQtdhMSrw7naQxmMsd7VGX7xS8K
 THQczgqJQimPoB3YX7EZjHhxNJXirXX6wot/zR1+1ub+NHH52Iz5hn/bBXgLdTf4yOok
 H5DdJ9iCquuzhftVq455z5OjFxkDFzlSxh1UNQVgFxMAwSdZTlLNt0RJbqGTvb1xK/w5
 nAM7DG2qUH2nve+Q7rG4e8E2MaCalJhcW6Yr4zCjm7VdPwo3KGKmhsXPTMTWBrG3B26x
 nzIqd6Bvl0dbc7130wbORo1vlOMO4Qco3+PQVjE8YbzntdAxV4HV0uwoAdGtHp1xfFef
 4Abg==
X-Gm-Message-State: ALoCoQn1dUbhW4yFiMYtvb1G1LYdFOVft5CNusqaOVOGCqZ9aFkc6XU+pRJrB+AweSz78dModC6i
X-Received: by 10.180.102.228 with SMTP id fr4mr6264603wib.4.1431002112970;
 Thu, 07 May 2015 05:35:12 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by mx.google.com with ESMTPSA id dj7sm3260253wjb.3.2015.05.07.05.35.11
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 07 May 2015 05:35:11 -0700 (PDT)
Message-ID: <554B5BF9.8020709@multiplay.co.uk>
Date: Thu, 07 May 2015 13:35:05 +0100
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
CC: freebsd-stable@freebsd.org
Subject: Re: zfs, cam sticking on failed disk
References: <20150507080749.GB1394@zxy.spb.ru>
 <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru>
 <554B40B6.6060902@multiplay.co.uk> <20150507104655.GT62239@zxy.spb.ru>
 <554B53E8.4000508@multiplay.co.uk> <20150507120508.GX62239@zxy.spb.ru>
In-Reply-To: <20150507120508.GX62239@zxy.spb.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 May 2015 12:35:21 -0000


On 07/05/2015 13:05, Slawa Olhovchenkov wrote:
> On Thu, May 07, 2015 at 01:00:40PM +0100, Steven Hartland wrote:
>
>>
>> On 07/05/2015 11:46, Slawa Olhovchenkov wrote:
>>> On Thu, May 07, 2015 at 11:38:46AM +0100, Steven Hartland wrote:
>>>
>>>>>>> How I can cancel this 24 requst?
>>>>>>> Why this requests don't timeout (3 hours already)?
>>>>>>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`).
>>>>>>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18?
>>>>>>>
>>>>>> If they are in mirrors, in theory you can just pull the disk, isci will
>>>>>> report to cam and cam will report to ZFS which should all recover.
>>>>> Yes, zmirror with da18.
>>>>> I am surprise that ZFS don't use da18. All zpool fully stuck.
>>>> A single low level request can only be handled by one device, if that
>>>> device returns an error then ZFS will use the other device, but not until.
>>> Why next requests don't routed to da18?
>>> Current request stuck on da19 (unlikely, but understund), but why
>>> stuck all pool?
>> Its still waiting for the request from the failed device to complete. As
>> far as ZFS currently knows there is nothing wrong with the device as its
>> had no failures.
> Can you explain some more?
> One requst waiting, understand.
> I am do next request. Some information need from vdev with failed
> disk. Failed disk more busy (queue long), why don't routed to mirror
> disk? Or, for metadata, to less busy vdev?
As no error has been reported to ZFS, due to the stalled IO, there is no 
failed vdev.

Yes in theory new requests should go to the other vdev, but there could 
be some dependency issues preventing that such as a syncing TXG.

     Regards
     Steve