From owner-freebsd-stable@FreeBSD.ORG Thu May 7 08:41:57 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F15AFC66 for ; Thu, 7 May 2015 08:41:57 +0000 (UTC) Received: from mail-wg0-f48.google.com (mail-wg0-f48.google.com [74.125.82.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9992F10B4 for ; Thu, 7 May 2015 08:41:57 +0000 (UTC) Received: by wgin8 with SMTP id n8so36160489wgi.0 for ; Thu, 07 May 2015 01:41:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=lGzQRWmMRKk1oLaJTaJKimL53kYiFCvA6br3pw4dvyI=; b=AuGKW2KdvwZI1jQexgZn3rgzrRH3ozVpz0vLZtcQjjkO/G3VLsNTyal50sZIgjGT9m hLklqtfURrLkWsEQ2yl2wS1+4ZXilaES+f3XppThVEPVK1hDhylGi5s49V7l+YkAyyoO APKau1j9TPNtQ3Rwn6PcfaJJmL8q4T4fCc/bjwOCvKae96YtM6PGmrhzv2+fjyuxS5ij CY55Oic6UvzCox9zUchIbIU6dwU2a2Dj5q7JM4rDWL6sQjWCZUFC9vcUZAL41UT01Usa 86C8VyTtDVu8aNBuCukmeOun8c9y6Tfk7Sv/sV7KYHZXUpYm3s/u0xeNg9cDWFwlIkfG b1BQ== X-Gm-Message-State: ALoCoQnr28eq0QXxlJs4XVd40HQcUKFY5sHJ7396xWeVUco9cPlD2UFXTeNp56WNxMBMWs5R3IAS X-Received: by 10.194.48.101 with SMTP id k5mr5128717wjn.79.1430988110491; Thu, 07 May 2015 01:41:50 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id bh7sm2256297wjb.8.2015.05.07.01.41.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 May 2015 01:41:49 -0700 (PDT) Message-ID: <554B2547.1090307@multiplay.co.uk> Date: Thu, 07 May 2015 09:41:43 +0100 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk References: <20150507080749.GB1394@zxy.spb.ru> In-Reply-To: <20150507080749.GB1394@zxy.spb.ru> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2015 08:41:58 -0000 On 07/05/2015 09:07, Slawa Olhovchenkov wrote: > I have zpool of 12 vdev (zmirrors). > One disk in one vdev out of service and stop serving reuquest: > > dT: 1.036s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0| ada0 > 0 0 0 0 0.0 0 0 0.0 0.0| ada1 > 1 0 0 0 0.0 0 0 0.0 0.0| ada2 > 0 0 0 0 0.0 0 0 0.0 0.0| ada3 > 0 0 0 0 0.0 0 0 0.0 0.0| da0 > 0 0 0 0 0.0 0 0 0.0 0.0| da1 > 0 0 0 0 0.0 0 0 0.0 0.0| da2 > 0 0 0 0 0.0 0 0 0.0 0.0| da3 > 0 0 0 0 0.0 0 0 0.0 0.0| da4 > 0 0 0 0 0.0 0 0 0.0 0.0| da5 > 0 0 0 0 0.0 0 0 0.0 0.0| da6 > 0 0 0 0 0.0 0 0 0.0 0.0| da7 > 0 0 0 0 0.0 0 0 0.0 0.0| da8 > 0 0 0 0 0.0 0 0 0.0 0.0| da9 > 0 0 0 0 0.0 0 0 0.0 0.0| da10 > 0 0 0 0 0.0 0 0 0.0 0.0| da11 > 0 0 0 0 0.0 0 0 0.0 0.0| da12 > 0 0 0 0 0.0 0 0 0.0 0.0| da13 > 0 0 0 0 0.0 0 0 0.0 0.0| da14 > 0 0 0 0 0.0 0 0 0.0 0.0| da15 > 0 0 0 0 0.0 0 0 0.0 0.0| da16 > 0 0 0 0 0.0 0 0 0.0 0.0| da17 > 0 0 0 0 0.0 0 0 0.0 0.0| da18 > 24 0 0 0 0.0 0 0 0.0 0.0| da19 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 0 0 0 0 0.0 0 0 0.0 0.0| da20 > 0 0 0 0 0.0 0 0 0.0 0.0| da21 > 0 0 0 0 0.0 0 0 0.0 0.0| da22 > 0 0 0 0 0.0 0 0 0.0 0.0| da23 > 0 0 0 0 0.0 0 0 0.0 0.0| da24 > 0 0 0 0 0.0 0 0 0.0 0.0| da25 > 0 0 0 0 0.0 0 0 0.0 0.0| da26 > 0 0 0 0 0.0 0 0 0.0 0.0| da27 > > As result zfs operation on this pool stoped too. > `zpool list -v` don't worked. > `zpool detach tank da19` don't worked. > Application worked with this pool sticking in `zfs` wchan and don't killed. > > # camcontrol tags da19 -v > (pass19:isci0:0:3:0): dev_openings 7 > (pass19:isci0:0:3:0): dev_active 25 > (pass19:isci0:0:3:0): allocated 25 > (pass19:isci0:0:3:0): queued 0 > (pass19:isci0:0:3:0): held 0 > (pass19:isci0:0:3:0): mintags 2 > (pass19:isci0:0:3:0): maxtags 255 > > How I can cancel this 24 requst? > Why this requests don't timeout (3 hours already)? > How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). > Why ZFS (or geom) don't timeout on request and don't rerouted to da18? > If they are in mirrors, in theory you can just pull the disk, isci will report to cam and cam will report to ZFS which should all recover. With regards to not timing out this could be a default issue, but having a very quick look that's not obvious in the code as isci_io_request_construct etc do indeed set a timeout when CAM_TIME_INFINITY hasn't been requested. The sysctl hw.isci.debug_level may be able to provide more information, but be aware this can be spammy. Regards Steve