From owner-freebsd-stable@FreeBSD.ORG  Thu May  7 12:46:55 2015
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 883D7761
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 12:46:55 +0000 (UTC)
Received: from mail-wg0-f41.google.com (mail-wg0-f41.google.com [74.125.82.41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 228CE10D4
 for <freebsd-stable@freebsd.org>; Thu,  7 May 2015 12:46:54 +0000 (UTC)
Received: by wgiu9 with SMTP id u9so42311868wgi.3
 for <freebsd-stable@freebsd.org>; Thu, 07 May 2015 05:46:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=dIj1Aeo83O/tKItLoSWvA1xDdH2IxndCt8duwG78YdA=;
 b=VoTBSrXORdF2PuiiutUVTaSvw3d9FGCpCLLBpuk+/5DV55M7oVUrze2aLDSoPt/bGG
 ADucwwFW4Osq+BFgaq1hAhKDFto1tBaAhViKWrFcrjwjSVlBMtmQroqBtWOXHCxfv51d
 RgVWlsvwoW7wOAdexox7tld9M/OuGRHWVfmNt6nWwoy/w6S2+0zbZ+AacdMvg46//NHG
 f94aequXqzwRjL55BGxendlYA/AZBeHulPg3PhVWVp5zKA7wO5Rox0eHsPCbzKi0x4G2
 r6vDAdkOBOnlcGYhxaoW0XY9Op85LLHsafwwkg1DmFf74yiAxDJrm9GoSYc3jTkh14Gj
 X7NA==
X-Gm-Message-State: ALoCoQm/ArfrpVuGFXQ0gBoMQn2r413x46CE5NoruOIe02TZWwmW1xH3lWoRpou8JWIbNU3snEnn
X-Received: by 10.180.84.65 with SMTP id w1mr6369668wiy.20.1431002807473;
 Thu, 07 May 2015 05:46:47 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by mx.google.com with ESMTPSA id c5sm3262025wjf.40.2015.05.07.05.46.46
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 07 May 2015 05:46:46 -0700 (PDT)
Message-ID: <554B5EB0.1080208@multiplay.co.uk>
Date: Thu, 07 May 2015 13:46:40 +0100
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
CC: freebsd-stable@freebsd.org
Subject: Re: zfs, cam sticking on failed disk
References: <20150507080749.GB1394@zxy.spb.ru>
 <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru>
 <554B40B6.6060902@multiplay.co.uk> <20150507104655.GT62239@zxy.spb.ru>
 <554B53E8.4000508@multiplay.co.uk> <20150507120508.GX62239@zxy.spb.ru>
 <554B5BF9.8020709@multiplay.co.uk> <20150507124416.GD1394@zxy.spb.ru>
In-Reply-To: <20150507124416.GD1394@zxy.spb.ru>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 May 2015 12:46:55 -0000


On 07/05/2015 13:44, Slawa Olhovchenkov wrote:
> On Thu, May 07, 2015 at 01:35:05PM +0100, Steven Hartland wrote:
>
>>
>> On 07/05/2015 13:05, Slawa Olhovchenkov wrote:
>>> On Thu, May 07, 2015 at 01:00:40PM +0100, Steven Hartland wrote:
>>>
>>>> On 07/05/2015 11:46, Slawa Olhovchenkov wrote:
>>>>> On Thu, May 07, 2015 at 11:38:46AM +0100, Steven Hartland wrote:
>>>>>
>>>>>>>>> How I can cancel this 24 requst?
>>>>>>>>> Why this requests don't timeout (3 hours already)?
>>>>>>>>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`).
>>>>>>>>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18?
>>>>>>>>>
>>>>>>>> If they are in mirrors, in theory you can just pull the disk, isci will
>>>>>>>> report to cam and cam will report to ZFS which should all recover.
>>>>>>> Yes, zmirror with da18.
>>>>>>> I am surprise that ZFS don't use da18. All zpool fully stuck.
>>>>>> A single low level request can only be handled by one device, if that
>>>>>> device returns an error then ZFS will use the other device, but not until.
>>>>> Why next requests don't routed to da18?
>>>>> Current request stuck on da19 (unlikely, but understund), but why
>>>>> stuck all pool?
>>>> Its still waiting for the request from the failed device to complete. As
>>>> far as ZFS currently knows there is nothing wrong with the device as its
>>>> had no failures.
>>> Can you explain some more?
>>> One requst waiting, understand.
>>> I am do next request. Some information need from vdev with failed
>>> disk. Failed disk more busy (queue long), why don't routed to mirror
>>> disk? Or, for metadata, to less busy vdev?
>> As no error has been reported to ZFS, due to the stalled IO, there is no
>> failed vdev.
> I see that device isn't failed (for both OS and ZFS).
> I am don't talk 'failed vdev'. I am talk 'busy vdev' or 'busy device'.
>
>> Yes in theory new requests should go to the other vdev, but there could
>> be some dependency issues preventing that such as a syncing TXG.
> Currenly this pool must not have write activity (from application).
> What about go to the other (mirror) device in the same vdev?
> Same dependency?
Yes, if there's an outstanding TXG, then I believe all IO will stall.