From owner-freebsd-stable@FreeBSD.ORG  Mon Aug 20 22:38:46 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3411116A417
	for <freebsd-stable@freebsd.org>; Mon, 20 Aug 2007 22:38:46 +0000 (UTC)
	(envelope-from matrix@itlegion.ru)
Received: from corpmail.itlegion.ru (corpmail.itlegion.ru [84.21.226.211])
	by mx1.freebsd.org (Postfix) with SMTP id 7678C13C4DA
	for <freebsd-stable@freebsd.org>; Mon, 20 Aug 2007 22:38:45 +0000 (UTC)
	(envelope-from matrix@itlegion.ru)
Received: (qmail 86233 invoked from network); 21 Aug 2007 02:38:43 +0400
Received: from unknown (HELO Artem) (192.168.0.12)
	by 84.21.226.211 with SMTP; 21 Aug 2007 02:38:43 +0400
X-AntiVirus: Checked by Dr.Web [version: 4.33, engine: 4.33.5.10110,
	virus records: 238958, updated: 20.08.2007]
Message-ID: <028f01c7e37a$d8f441b0$0c00a8c0@Artem>
From: "Artem Kuchin" <matrix@itlegion.ru>
To: <freebsd-stable@freebsd.org>
Date: Tue, 21 Aug 2007 02:38:34 +0400
Organization: IT Legion
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="koi8-r"; reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Subject: A little story of failed raid5 (3ware 8000 series)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Aug 2007 22:38:46 -0000

Hello!

Here is the newest story of mine about how one should
never use raid5.

Controller is 8xxx-4LP.
I have a simple 360GB raid5 with 4 drives since 2004.
Only about a year ago i realized how much speed i have
wasted be saving lousy 120GB. I should have choosen
bigger driver and setup two mirrors instead.

But that's no the point. A week ago one driver just
totally failed. It fell out of the unit and when i tried
to rebuild the unit it failed. It seemed like the driver
electronis failed. ANyhow, i have found newest 160gb seagate
driver for replacement (twice as thin, very nicely done
electornics on it).

A day ago at 11 am i have turn off the server,
pull out the old driver, installed a new one, turned of the server
and started rebuild in an hour from remote location via web interface.
After about 5 minuted the machine became unresponsive. Tried rebooting
- nothing. I went to the machine and fingure out, that rebuild failed (0%)
and some data cannot be read because of bad sectors.

Well, hell, i thoght. Maybe i could tell teh controller to ignore all the
errors and just some rebuilding and the figure out which driver failed,
replace it, rebuild again and restore corrupted data from backup.
Noway, controller said.

- i cannot make it ignore read errors
- i cannot figure out which driver has bad sectors
(maybe someone know it?)

But i don't understand how and why it happened. ONly 6 hours ago (a night before)
all those files were backed up fine w/o any read error. And now, right after replacing
the driver and starting rebuild it said that there are bad sectors all over those file.
How come?

Well. Since we have a buch of full and inceremnetal paraoid backups no data was lost and
we are in process of recovering. However, i simply imaged what would happed if one more
driver completelly failed. That would mean that we have lost all data, since any of the disk
which left do not contain any readable copy of one data (unlink mirror, for example).

So, we are migrating to mirror config with huge disks.

I am thinking about raid10 for more perfomance. It seems a lot more safe, since if any pair of disks failed the data is still 
readable and even if all disks have bad block the data can be easily recovered by fairly simply script from the couterpart. But the 
problem, however,

So, no raid5 or even raid 6 for me any more. Never!


--
Regards,
Artem