Date: Thu, 07 May 2015 11:38:46 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: Slawa Olhovchenkov <slw@zxy.spb.ru> Cc: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk Message-ID: <554B40B6.6060902@multiplay.co.uk> In-Reply-To: <20150507095048.GC1394@zxy.spb.ru> References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/05/2015 10:50, Slawa Olhovchenkov wrote: > On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote: > >> On 07/05/2015 09:07, Slawa Olhovchenkov wrote: >>> I have zpool of 12 vdev (zmirrors). >>> One disk in one vdev out of service and stop serving reuquest: >>> >>> dT: 1.036s w: 1.000s >>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada0 >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada1 >>> 1 0 0 0 0.0 0 0 0.0 0.0| ada2 >>> 0 0 0 0 0.0 0 0 0.0 0.0| ada3 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da1 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da2 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da3 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da4 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da5 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da6 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da7 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da8 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da9 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da10 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da11 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da12 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da13 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da14 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da15 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da16 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da17 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da18 >>> 24 0 0 0 0.0 0 0 0.0 0.0| da19 >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> 0 0 0 0 0.0 0 0 0.0 0.0| da20 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da21 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da22 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da23 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da24 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da25 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da26 >>> 0 0 0 0 0.0 0 0 0.0 0.0| da27 >>> >>> As result zfs operation on this pool stoped too. >>> `zpool list -v` don't worked. >>> `zpool detach tank da19` don't worked. >>> Application worked with this pool sticking in `zfs` wchan and don't killed. >>> >>> # camcontrol tags da19 -v >>> (pass19:isci0:0:3:0): dev_openings 7 >>> (pass19:isci0:0:3:0): dev_active 25 >>> (pass19:isci0:0:3:0): allocated 25 >>> (pass19:isci0:0:3:0): queued 0 >>> (pass19:isci0:0:3:0): held 0 >>> (pass19:isci0:0:3:0): mintags 2 >>> (pass19:isci0:0:3:0): maxtags 255 >>> >>> How I can cancel this 24 requst? >>> Why this requests don't timeout (3 hours already)? >>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). >>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18? >>> >> If they are in mirrors, in theory you can just pull the disk, isci will >> report to cam and cam will report to ZFS which should all recover. > Yes, zmirror with da18. > I am surprise that ZFS don't use da18. All zpool fully stuck. A single low level request can only be handled by one device, if that device returns an error then ZFS will use the other device, but not until. > >> With regards to not timing out this could be a default issue, but having > I am understand, no universal acceptable timeout for all cases: good > disk, good saturated disk, tape, tape library, failed disk, etc. > In my case -- failed disk. This model already failed (other specimen) > with same symptoms). > > May be exist some tricks for cancel/aborting all request in queue and > removing disk from system? Unlikely tbh, pulling the disk however should. > >> a very quick look that's not obvious in the code as >> isci_io_request_construct etc do indeed set a timeout when >> CAM_TIME_INFINITY hasn't been requested. >> >> The sysctl hw.isci.debug_level may be able to provide more information, >> but be aware this can be spammy. > I am already have this situation, what command interesting after > setting hw.isci.debug_level? I'm afraid I'm not familiar isci I'm afraid possibly someone else who is can chime in. Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?554B40B6.6060902>