Date: Thu, 7 May 2015 15:44:16 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: zfs, cam sticking on failed disk Message-ID: <20150507124416.GD1394@zxy.spb.ru> In-Reply-To: <554B5BF9.8020709@multiplay.co.uk> References: <20150507080749.GB1394@zxy.spb.ru> <554B2547.1090307@multiplay.co.uk> <20150507095048.GC1394@zxy.spb.ru> <554B40B6.6060902@multiplay.co.uk> <20150507104655.GT62239@zxy.spb.ru> <554B53E8.4000508@multiplay.co.uk> <20150507120508.GX62239@zxy.spb.ru> <554B5BF9.8020709@multiplay.co.uk>
index | next in thread | previous in thread | raw e-mail
On Thu, May 07, 2015 at 01:35:05PM +0100, Steven Hartland wrote: > > > On 07/05/2015 13:05, Slawa Olhovchenkov wrote: > > On Thu, May 07, 2015 at 01:00:40PM +0100, Steven Hartland wrote: > > > >> > >> On 07/05/2015 11:46, Slawa Olhovchenkov wrote: > >>> On Thu, May 07, 2015 at 11:38:46AM +0100, Steven Hartland wrote: > >>> > >>>>>>> How I can cancel this 24 requst? > >>>>>>> Why this requests don't timeout (3 hours already)? > >>>>>>> How I can forced detach this disk? (I am lready try `camcontrol reset`, `camconrol rescan`). > >>>>>>> Why ZFS (or geom) don't timeout on request and don't rerouted to da18? > >>>>>>> > >>>>>> If they are in mirrors, in theory you can just pull the disk, isci will > >>>>>> report to cam and cam will report to ZFS which should all recover. > >>>>> Yes, zmirror with da18. > >>>>> I am surprise that ZFS don't use da18. All zpool fully stuck. > >>>> A single low level request can only be handled by one device, if that > >>>> device returns an error then ZFS will use the other device, but not until. > >>> Why next requests don't routed to da18? > >>> Current request stuck on da19 (unlikely, but understund), but why > >>> stuck all pool? > >> Its still waiting for the request from the failed device to complete. As > >> far as ZFS currently knows there is nothing wrong with the device as its > >> had no failures. > > Can you explain some more? > > One requst waiting, understand. > > I am do next request. Some information need from vdev with failed > > disk. Failed disk more busy (queue long), why don't routed to mirror > > disk? Or, for metadata, to less busy vdev? > As no error has been reported to ZFS, due to the stalled IO, there is no > failed vdev. I see that device isn't failed (for both OS and ZFS). I am don't talk 'failed vdev'. I am talk 'busy vdev' or 'busy device'. > Yes in theory new requests should go to the other vdev, but there could > be some dependency issues preventing that such as a syncing TXG. Currenly this pool must not have write activity (from application). What about go to the other (mirror) device in the same vdev? Same dependency?home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150507124416.GD1394>
