From owner-freebsd-stable@FreeBSD.ORG Mon Aug 20 22:38:46 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3411116A417 for ; Mon, 20 Aug 2007 22:38:46 +0000 (UTC) (envelope-from matrix@itlegion.ru) Received: from corpmail.itlegion.ru (corpmail.itlegion.ru [84.21.226.211]) by mx1.freebsd.org (Postfix) with SMTP id 7678C13C4DA for ; Mon, 20 Aug 2007 22:38:45 +0000 (UTC) (envelope-from matrix@itlegion.ru) Received: (qmail 86233 invoked from network); 21 Aug 2007 02:38:43 +0400 Received: from unknown (HELO Artem) (192.168.0.12) by 84.21.226.211 with SMTP; 21 Aug 2007 02:38:43 +0400 X-AntiVirus: Checked by Dr.Web [version: 4.33, engine: 4.33.5.10110, virus records: 238958, updated: 20.08.2007] Message-ID: <028f01c7e37a$d8f441b0$0c00a8c0@Artem> From: "Artem Kuchin" To: Date: Tue, 21 Aug 2007 02:38:34 +0400 Organization: IT Legion MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="koi8-r"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Subject: A little story of failed raid5 (3ware 8000 series) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 22:38:46 -0000 Hello! Here is the newest story of mine about how one should never use raid5. Controller is 8xxx-4LP. I have a simple 360GB raid5 with 4 drives since 2004. Only about a year ago i realized how much speed i have wasted be saving lousy 120GB. I should have choosen bigger driver and setup two mirrors instead. But that's no the point. A week ago one driver just totally failed. It fell out of the unit and when i tried to rebuild the unit it failed. It seemed like the driver electronis failed. ANyhow, i have found newest 160gb seagate driver for replacement (twice as thin, very nicely done electornics on it). A day ago at 11 am i have turn off the server, pull out the old driver, installed a new one, turned of the server and started rebuild in an hour from remote location via web interface. After about 5 minuted the machine became unresponsive. Tried rebooting - nothing. I went to the machine and fingure out, that rebuild failed (0%) and some data cannot be read because of bad sectors. Well, hell, i thoght. Maybe i could tell teh controller to ignore all the errors and just some rebuilding and the figure out which driver failed, replace it, rebuild again and restore corrupted data from backup. Noway, controller said. - i cannot make it ignore read errors - i cannot figure out which driver has bad sectors (maybe someone know it?) But i don't understand how and why it happened. ONly 6 hours ago (a night before) all those files were backed up fine w/o any read error. And now, right after replacing the driver and starting rebuild it said that there are bad sectors all over those file. How come? Well. Since we have a buch of full and inceremnetal paraoid backups no data was lost and we are in process of recovering. However, i simply imaged what would happed if one more driver completelly failed. That would mean that we have lost all data, since any of the disk which left do not contain any readable copy of one data (unlink mirror, for example). So, we are migrating to mirror config with huge disks. I am thinking about raid10 for more perfomance. It seems a lot more safe, since if any pair of disks failed the data is still readable and even if all disks have bad block the data can be easily recovered by fairly simply script from the couterpart. But the problem, however, So, no raid5 or even raid 6 for me any more. Never! -- Regards, Artem