Date: Wed, 05 May 2010 16:56:41 +0200 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: ZFS (zpool) doesn't detect failed drive Message-ID: <4BE18729.3050209@omnilan.de> In-Reply-To: <4BE16784.8050400@omnilan.de> References: <4BE16784.8050400@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8BD6264B362B074B88954B27 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Harald Schmalzbauer schrieb am 05.05.2010 14:41 (localtime): > Hello, >=20 > one drive of my mirror failed today, but 'zpool staus' shows it "online= ". > Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1'=20 > hangs infinitely. =2E.. Sorry, I made an error with zpool create. Somehow the little word=20 "mirror" must have been lost. So the pool wasn't a mirror but a stripe.=20 Then of course I can't make one vdev offline. Sorry for the noise. But I took the opportunity to do some tests with that failing drive and=20 created a _real_ mirror. That works without failures, but using the=20 mirror again leads to: ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left) ata3: port is not ready (timeout 10000ms) tfd =3D 00000080 ata3: hardware reset timeout ad1: FAILURE - device detached Now zpool reporsts the vdev ad1 still online although it has been=20 detached and 'atacontrol list' doesn't show it anymore: zpool status pool: URUBAmirrorP1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are=20 unaffected. action: Determine if the device needs to be replaced, and clear the error= s using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM URUBAmirrorP1 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad1 ONLINE 3 302K 0 ad2 ONLINE 0 0 0 errors: No known data errors atacontrol list ATA channel 2: Master: ad0 <TRANSCEND/20090520> SATA revision 1.x Slave: no device present ATA channel 3: Master: no device present Slave: no device present ATA channel 4: Master: ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x Slave: no device present ATA channel 5: Master: ad3 <ST3750640NS/3.AEG> SATA revision 1.x Slave: no device present How should such a failure be handled? Do I have to manually mark the drive offline for zpool? Thanks, -Harry --------------enig8BD6264B362B074B88954B27 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkvhhykACgkQLDqVQ9VXb8jSkgCgpLygtJqPYi+8ZrCCuUdyI7Pw LmQAnRn4VGBFQDN8ufU2ckVDMBT9x/NA =9sN5 -----END PGP SIGNATURE----- --------------enig8BD6264B362B074B88954B27--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BE18729.3050209>