Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Sep 2008 16:40:01 -0400
From:      Nathanael Hoyle <nhoyle@hoyletech.com>
To:        Karl Pielorz <kpielorz_lst@tdx.co.uk>
Cc:        freebsd-hackers@freebsd.org, Jeremy Chadwick <koitsu@FreeBSD.org>
Subject:   Re: ZFS w/failing drives - any equivalent of Solaris FMA?
Message-ID:  <1221338401.18959.7.camel@localhost>
In-Reply-To: <FEFC1751EDD6B66957A04942@Quadro64.tdx.co.uk>
References:  <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk> <20080912132102.GB56923@icarus.home.lan> <3BE629D093001F6BA2C6791C@Slim64.dmpriest.net.uk> <20080912160422.GB60094@icarus.home.lan> <FEFC1751EDD6B66957A04942@Quadro64.tdx.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2008-09-12 at 20:31 +0100, Karl Pielorz wrote:
> 
> --On 12 September 2008 09:04 -0700 Jeremy Chadwick <koitsu@FreeBSD.org> 
> wrote:
> 
> > I know ATA will notice a detached channel, because I myself have done
> > it: administratively, that is -- atacontrol detach ataX.  But the only
> > time that can happen "automatically" is if the actual controller does
> > so itself, or if FreeBSD is told to do it administratively.
> 
> I think the problem at the moment is, ZFS "doesn't care" - it's 
> deliberately remote from things like drivers, and drives - and at the 
> moment, there's no 'middle layer' or way for at least the ATA drivers to 
> communicate to ZFS that a drive 'has failed' (I mean, for starters, you've 
> got the problem of "what's a failed drive" - presumably a drive that's 
> operating outside a set of limits? - The first probably being 'is it still 
> attached?' :)
> 
> That was a thread recently on the Open Solaris ZFS forum - and discussed at 
> length...
> 
> > I am also very curious to know the exact brand/model of 8-port SATA
> > controller from Supermicro you are using, *especially* if it uses ata(4)
> > rather than CAM and da(4).
> 
> The controllers ID as:
> 
>   Marvell 88SX6081 SATA300 controller
> 
> They're SuperMicro 8 PORT PCI-X SATA controllers, 'AOC-SAT2-MV8' - and they 
> definitely show as 'adX'
> 
> > Such Supermicro controllers were recently
> > discussed on freebsd-stable (or was it -hardware?), and no one was able
> > to come to a concise decision as to whether or not they were decent or
> > even remotely trusted.  Supermicro provides a few different SATA HBAs.
> 
> Well, I've tested these cards for a number of months now, and they seem 
> fine  here - at least with the WD drives I'm currently running (not saying 
> they're 'perfect' - but for my setup, I've not seen any issues). I didn't 
> notice any 'bad behaviour' when testing them under UFS, and when running 
> under ZFS they've picked up no checksum errors (or console messages) for 
> the duration the box has been running.
> 
> > I can see the usefulness in Solaris's FMA thing.  My big concern is
> > whether or not FMA actually pulls the disk off the channel, or if it
> > just leaves the disk/channel connected and simply informs kernel pieces
> > not to use it.  If it pulls the disk off the channel, I have serious
> > qualms with it.
> 
> I don't think it pulls it - I think it's looks at it's policies, and does 
> what they say, which would seem to be the equivalent of 'zpool offline dev' 
> by default (which, again doesn't pull it off any busses - it just notifies 
> ZFS not to send I/O to that device).
> 

This is consistent with my understanding of the behavior under Solaris,
and is similar to how Sun Solstice Disksuite has worked for years... the
drive will get "kicked out" of the array, but will remain attached to
the bus and fully visible to the system.  The array gets marked as
degraded and no further I/O is performed against the "submirror" until
it is either manually re-synced or replaced and resynced.
Occassionally, disksuite is somewhat overly aggressive about this, but
the box stays responsive and resilient when a drive goes down.

> I'll have to do a test using da / CAM driven disks (or ask someone who 
> worked on the port ;) - but I'd guess, unless there's something been added 
> to CAM to tell ZFS to offline the disk, it'll do the same - i.e. ZFS will 
> continue to issue I/O requests to disks as it needs - as at least in Open 
> Solaris, it's deemed *not* to be ZFS's job to detect failed disks, or do 
> anything about them - other than what it's told.
> 
> ZFS under FreeBSD still works despite this (and works wonderfully well) - 
> it just means if any of your drives 'go out to lunch' - unless they fail in 
> such a way that the I/O requests are returned immediately as 'failed' (i.e. 
> I guess if the device node has gone) - ZFS will keep issuing (and 
> potentially pausing) waiting for I/O requests to failed drives, because it 
> doesn't know, doesn't care - and hasn't been told to do otherwise.
> 

Unfortunately this means that a single failed/failing drive can make the
entire i/o subsystem (and likely the machine with it) nonresponsive and
fails badly at high-availability.  There probably needs to be some sort
of "middleware" that can monitor the responses from the commands to the
individual drives and configurably take action to offline the drive from
the zpool in order to attain high availability.

-Nathanael

> -Kp
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1221338401.18959.7.camel>