Date: Thu, 7 May 2015 12:50:49 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk Message-ID: <20150507095048.GC1394@zxy.spb.ru> In-Reply-To: <554B2547.1090307@multiplay.co.uk> References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 07, 2015 at 09:41:43AM +0100, Steven Hartland wrote: > On 07/05/2015 09:07, Slawa Olhovchenkov wrote: > > I have zpool of 12 vdev (zmirrors). > > One disk in one vdev out of service and stop serving reuquest: > > > > dT: 1.036s w: 1.000s > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > > 0 0 0 0 0.0 0 0 0.0 0.0| ada0 > > 0 0 0 0 0.0 0 0 0.0 0.0| ada1 > > 1 0 0 0 0.0 0 0 0.0 0.0| ada2 > > 0 0 0 0 0.0 0 0 0.0 0.0| ada3 > > 0 0 0 0 0.0 0 0 0.0 0.0| da0 > > 0 0 0 0 0.0 0 0 0.0 0.0| da1 > > 0 0 0 0 0.0 0 0 0.0 0.0| da2 > > 0 0 0 0 0.0 0 0 0.0 0.0| da3 > > 0 0 0 0 0.0 0 0 0.0 0.0| da4 > > 0 0 0 0 0.0 0 0 0.0 0.0| da5 > > 0 0 0 0 0.0 0 0 0.0 0.0| da6 > > 0 0 0 0 0.0 0 0 0.0 0.0| da7 > > 0 0 0 0 0.0 0 0 0.0 0.0| da8 > > 0 0 0 0 0.0 0 0 0.0 0.0| da9 > > 0 0 0 0 0.0 0 0 0.0 0.0| da10 > > 0 0 0 0 0.0 0 0 0.0 0.0| da11 > > 0 0 0 0 0.0 0 0 0.0 0.0| da12 > > 0 0 0 0 0.0 0 0 0.0 0.0| da13 > > 0 0 0 0 0.0 0 0 0.0 0.0| da14 > > 0 0 0 0 0.0 0 0 0.0 0.0| da15 > > 0 0 0 0 0.0 0 0 0.0 0.0| da16 > > 0 0 0 0 0.0 0 0 0.0 0.0| da17 > > 0 0 0 0 0.0 0 0 0.0 0.0| da18 > > 24 0 0 0 0.0 0 0 0.0 0.0| da19 > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > 0 0 0 0 0.0 0 0 0.0 0.0| da20 > > 0 0 0 0 0.0 0 0 0.0 0.0| da21 > > 0 0 0 0 0.0 0 0 0.0 0.0| da22 > > 0 0 0 0 0.0 0 0 0.0 0.0| da23 > > 0 0 0 0 0.0 0 0 0.0 0.0| da24 > > 0 0 0 0 0.0 0 0 0.0 0.0| da25 > > 0 0 0 0 0.0 0 0 0.0 0.0| da26 > > 0 0 0 0 0.0 0 0 0.0 0.0| da27 > > > > As result zfs operation on this pool stoped too. > > `zpool list -v` don't worked. > > `zpool detach tank da19` don't worked. > > Application worked with this pool sticking in `zfs` wchan and don't killed. > > > > # camcontrol tags da19 -v > > (pass19:isci0:0:3:0): dev_openings 7 > > (pass19:isci0:0:3:0): dev_active 25 > > (pass19:isci0:0:3:0): allocated 25 > > (pass19:isci0:0:3:0): queued 0 > > (pass19:isci0:0:3:0): held 0 > > (pass19:isci0:0:3:0): mintags 2 > > (pass19:isci0:0:3:0): maxtags 255 > > > > How I can cancel this 24 requst? > > Why this requests don't timeout (3 hours already)? > > How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). > > Why ZFS (or geom) don't timeout on request and don't rerouted to da18? > > > If they are in mirrors, in theory you can just pull the disk, isci will > report to cam and cam will report to ZFS which should all recover. Yes, zmirror with da18. I am surprise that ZFS don't use da18. All zpool fully stuck. > With regards to not timing out this could be a default issue, but having I am understand, no universal acceptable timeout for all cases: good disk, good saturated disk, tape, tape library, failed disk, etc. In my case -- failed disk. This model already failed (other specimen) with same symptoms). May be exist some tricks for cancel/aborting all request in queue and removing disk from system? > a very quick look that's not obvious in the code as > isci_io_request_construct etc do indeed set a timeout when > CAM_TIME_INFINITY hasn't been requested. > > The sysctl hw.isci.debug_level may be able to provide more information, > but be aware this can be spammy. I am already have this situation, what command interesting after setting hw.isci.debug_level?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150507095048.GC1394>