From owner-freebsd-stable@FreeBSD.ORG Tue Aug 21 09:59:10 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 055FB16A418 for ; Tue, 21 Aug 2007 09:59:10 +0000 (UTC) (envelope-from matrix@itlegion.ru) Received: from corpmail.itlegion.ru (corpmail.itlegion.ru [84.21.226.211]) by mx1.freebsd.org (Postfix) with SMTP id 490CB13C48A for ; Tue, 21 Aug 2007 09:59:09 +0000 (UTC) (envelope-from matrix@itlegion.ru) Received: (qmail 4182 invoked from network); 21 Aug 2007 13:59:07 +0400 Received: from unknown (HELO Artem) (192.168.0.12) by 84.21.226.211 with SMTP; 21 Aug 2007 13:59:07 +0400 X-AntiVirus: Checked by Dr.Web [version: 4.33, engine: 4.33.5.10110, virus records: 238998, updated: 21.08.2007] Message-ID: <00c901c7e3d9$e6134760$0c00a8c0@Artem> From: "Artem Kuchin" To: "Daniel O'Connor" , References: <028f01c7e37a$d8f441b0$0c00a8c0@Artem> <46CA7681.3070909@gneto.com><03bc01c7e3b8$7f9a3a50$0c00a8c0@Artem> <200708211606.00429.doconnor@gsoft.com.au> Date: Tue, 21 Aug 2007 13:57:25 +0400 Organization: IT Legion MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3138 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Cc: Martin Nilsson Subject: Re: A little story of failed raid5 (3ware 8000 series) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 09:59:10 -0000 >You can run smartmontools on disks behind 3ware controllers, eg >/dev/twe0 -d 3ware,0 -a -o on -S on -m root@localhost >/dev/twe0 -d 3ware,1 -a -o on -S on -m root@localhost did this: smartctl /dev/twe0 -d 3ware,1 -a for each driver on another server. Two driver are pretty old, the driver on port 2 is less than a month old. However, ALL of the drives have the same values for this 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 How come the number are the same? Even more, what does this 100 mean? 100% of backup sector space is free or just 100 sectors are available? How many total of them in there. Why does it say "Pre-fail" if it is WAY above the threshold? This data seems to be useless. Now, i did the same for the raid which failed and got me into so many trobles and has bad sectors now (some files are unredable): smartctl /dev/twe0 -d 3ware,0 -A 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 smartctl /dev/twe0 -d 3ware,1 -A 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 39 smartctl /dev/twe0 -d 3ware,2 -A 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 9 Now this is BS!!! Agaim accroding to SMART i shoud lookup at VALUE (100) and see if it is below THRES (36). If it is then i am in trouble. No, it does no work this way. Now, if we look at raw number we see 39 for disk1 and 9 for disk 2 For 39 disk1 also 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 22 1 Raw_Read_Error_Rate 0x000f 058 055 006 Pre-fail Always - 170185544 195 Hardware_ECC_Recovered 0x001a 058 055 000 Old_age Always - 170185544 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 524461066 Even for the newly inserted ( 24 hours ago, absulutelly new) driver: 7 Seek_Error_Rate 0x000f 069 060 030 Pre-fail Always - 8525167 195 Hardware_ECC_Recovered 0x001a 069 066 000 Old_age Always - 8433725 Now, as i undertand the main indication is "Offline_Uncorrectable" is raw value of it any more than 0 - REPLACE DRIVER ASAP (or maybe it is too late and it is "replace driver asap" as soon as Reallocated_Sector_Ct >0 ?) Now, what i don't understand is why Hardware_ECC_Recovered and Seek_Error_Rate are so hight. The first one is maybe relate to cabling problem. The driver are all in hot swap baskets of supermicro 2u case. Maybe backpanel is no so good? Seek_Error_Rate is a mistety for me. Any idea? -- Artem