Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Nov 2012 09:28:51 +0000
From:      Paul Wootton <paul-freebsd@fletchermoorland.co.uk>
To:        Zaphod Beeblebrox <zbeeble@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS RaidZ-2 problems
Message-ID:  <509240D3.7070607@fletchermoorland.co.uk>
In-Reply-To: <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com>
References:  <508F98F9.3040604@fletchermoorland.co.uk> <1351598684.88435.19.camel@btw.pki2.com> <508FE643.4090107@fletchermoorland.co.uk> <op.wmz1vtrd8527sy@ronaldradial.versatec.local> <5090010A.4050109@fletchermoorland.co.uk> <op.wm1axoqv8527sy@ronaldradial.versatec.local> <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/31/12 17:58, Zaphod Beeblebrox wrote:
> I'd start off by saying "smart is your friend."  Install smartmontools
> and study the somewhat opaque "smartctl -a /dev/mydisk" output
> carefully.  Try running a short and/or long test, too.  Many times the
> disk can tell you what the problem is.  If too many blocks are being
> replaced, your drive is dying.  If the drive sees errors in commands
> it receives, the cable or the controller are at fault.   ZFS itself
> does _exceptionally_ well at trying to use what it has.

I already run SmartMonTools regularly. I do have a one of my drives that 
is starting to go bad.
The drive that keeps disconnecting actually looks on on SMART (when it's 
connected).

I normally also run a period scrub every few days (I've been caught out 
a few times before)

> I'll also say that bad power supplies make for bad disks.  Replacing a
> power supply has often been the solution to bad disk problems I've
> had.  Disks are sensitive to under voltage problems.  Brown-outs can
> exacerbate this problem.  My parents live out where power is very
> flaky.  Cheap UPSs didn't help much ... but a good power supply can
> make all the difference.
Maybe... I will not run out a bad power supply
> But I've also had bad controllers of late, too.  My most recent
> problem had my 9-disk raidZ1 array loose a disk.  Smartctl said that
> it was loosing blocks fast, so I RMA'd the disk.  When the new disk
> came, the array just wouldn't heal... it kept loosing the disks
> attached to a certain controller.  Now it's possible the controller
> was bad before the disk had died ... or that it died during the first
> attempt at resilver ... or that FreeBSD drivers don't like it anymore
> ... I don't know.
>
> My solution was to get two more 4 drive "pro box" SATA enclosures.
> They use a 1-to-4 SATA breakout and the 6 motherboard ports I have are
> a revision of the ICH11 intel chipset that supports SATA port
> replication (I already had two of these boxes).  In this manner I
> could remove the defective controller and put all disks onto the
> motherboard ICH11 (it actually also allowed me to later expand the
> array... but that's not part of this story).
Again maybe... It might be a controller or cable. It could actually be 
the drive.


I am not worried about the hardware side. I can replace the disks, 
cables, controllers and power supply with out any problems.

As I said before, the issue I have is, I have a 9 RAIDZ-2 pack with only 
1 disk showing as offline and the pack is showing as faulted.
If the power supply was bouncing and a drive was giving bad data, I 
would expect ZFS to report that 2 drives were faulted (1 offline and 1 
corrupt)

Is there a way with ZDB that I can see why the pool is showing as 
faulted? Can it tell me which drives it thinks are bad, or has bad data?


Paul



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?509240D3.7070607>