Date: Thu, 01 Nov 2012 09:28:51 +0000 From: Paul Wootton <paul-freebsd@fletchermoorland.co.uk> To: Zaphod Beeblebrox <zbeeble@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RaidZ-2 problems Message-ID: <509240D3.7070607@fletchermoorland.co.uk> In-Reply-To: <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com> References: <508F98F9.3040604@fletchermoorland.co.uk> <1351598684.88435.19.camel@btw.pki2.com> <508FE643.4090107@fletchermoorland.co.uk> <op.wmz1vtrd8527sy@ronaldradial.versatec.local> <5090010A.4050109@fletchermoorland.co.uk> <op.wm1axoqv8527sy@ronaldradial.versatec.local> <CACpH0MeJpSg3ti-QUgT=XwaC0jkEo5JeBAfRGPTFfUE6eLJFJg@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
On 10/31/12 17:58, Zaphod Beeblebrox wrote: > I'd start off by saying "smart is your friend." Install smartmontools > and study the somewhat opaque "smartctl -a /dev/mydisk" output > carefully. Try running a short and/or long test, too. Many times the > disk can tell you what the problem is. If too many blocks are being > replaced, your drive is dying. If the drive sees errors in commands > it receives, the cable or the controller are at fault. ZFS itself > does _exceptionally_ well at trying to use what it has. I already run SmartMonTools regularly. I do have a one of my drives that is starting to go bad. The drive that keeps disconnecting actually looks on on SMART (when it's connected). I normally also run a period scrub every few days (I've been caught out a few times before) > I'll also say that bad power supplies make for bad disks. Replacing a > power supply has often been the solution to bad disk problems I've > had. Disks are sensitive to under voltage problems. Brown-outs can > exacerbate this problem. My parents live out where power is very > flaky. Cheap UPSs didn't help much ... but a good power supply can > make all the difference. Maybe... I will not run out a bad power supply > But I've also had bad controllers of late, too. My most recent > problem had my 9-disk raidZ1 array loose a disk. Smartctl said that > it was loosing blocks fast, so I RMA'd the disk. When the new disk > came, the array just wouldn't heal... it kept loosing the disks > attached to a certain controller. Now it's possible the controller > was bad before the disk had died ... or that it died during the first > attempt at resilver ... or that FreeBSD drivers don't like it anymore > ... I don't know. > > My solution was to get two more 4 drive "pro box" SATA enclosures. > They use a 1-to-4 SATA breakout and the 6 motherboard ports I have are > a revision of the ICH11 intel chipset that supports SATA port > replication (I already had two of these boxes). In this manner I > could remove the defective controller and put all disks onto the > motherboard ICH11 (it actually also allowed me to later expand the > array... but that's not part of this story). Again maybe... It might be a controller or cable. It could actually be the drive. I am not worried about the hardware side. I can replace the disks, cables, controllers and power supply with out any problems. As I said before, the issue I have is, I have a 9 RAIDZ-2 pack with only 1 disk showing as offline and the pack is showing as faulted. If the power supply was bouncing and a drive was giving bad data, I would expect ZFS to report that 2 drives were faulted (1 offline and 1 corrupt) Is there a way with ZDB that I can see why the pool is showing as faulted? Can it tell me which drives it thinks are bad, or has bad data? Paulhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?509240D3.7070607>
