From owner-freebsd-geom@FreeBSD.ORG  Sat Apr 24 13:48:07 2010
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A81E0106564A
	for <freebsd-geom@freebsd.org>; Sat, 24 Apr 2010 13:48:07 +0000 (UTC)
	(envelope-from lister@kawashti.org)
Received: from mra.kawashti.org (mra.kawashti.org [78.136.5.95])
	by mx1.freebsd.org (Postfix) with ESMTP id 463458FC08
	for <freebsd-geom@freebsd.org>; Sat, 24 Apr 2010 13:48:06 +0000 (UTC)
Received: from mx.kawashti.org (mx.kawashti.org [196.218.21.179])
	(using SSLv3 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mra.kawashti.org (Postfix) with ESMTP id BC85A4902E5
	for <freebsd-geom@freebsd.org>; Sat, 24 Apr 2010 14:48:04 +0100 (BST)
Received: from neo ([10.10.10.10])
	by mx.kawashti.org (Kawashti Mail) with SMTP id RDS02182
	for <freebsd-geom@freebsd.org>; Sat, 24 Apr 2010 15:48:02 +0200
Message-ID: <8848B2F8F5AC4BBF9341A2E60A4328A2@neo>
From: "Lister" <lister@kawashti.org>
To: "GEOM" <freebsd-geom@freebsd.org>
Date: Sat, 24 Apr 2010 15:47:52 +0200
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="windows-1256";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.3790.4548
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.3790.4325
Subject: gmirror of 2 H/W RAID5s and nVidia SATARAID
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Apr 2010 13:48:07 -0000

Hello all,

This posting is somewhat related to my earlier one titled "OCE and GPT".

   In a production environment, I have these 2 systems: SH1 & SH2. They're both 7.1-REL and AMD64. I built them to achieve "RAID for the doubly paranoid." Together they have 3 RAID5's of the same 4TB data. To lower the risks even more, I've intentionally used different brands of everything but the hard drives, CPUs and RAM. Even the chassis and redunant PSUs are different.

SH1 has 2 3Ware 9550SXU HBAs each doing a RAID5. The 2 resultant RAID disks (da0 & da1) are gmirrored to get RAID51. They're mirrored as /dev/mirror/RAID51

SH2 has 1 HighPoint RocketRAID 2520 connected to an external enclosure (Norco DS-1500). It has the 3rd RAID5 which is rsync'd to SH1's mirror.

   Despite the different motherboards (Asus and Gigabye), they both feature "nVidia Media Shield" which is used to setup a RAID1 from which either FreeBSD boots. In otherwords, the mobo's RAID1 is entirely for the OS, and the RAID5's are entirely and strictly data-only.

SH2's RAID1 was cloned from SH1's using LiveCD and dump/restore over ssh. Then, host-specific files have been patched.

I used gpt to partition /dev/mirror/RAID51 on SH1 and da0 on SH2. I didn't know about gpart then. Partitions are, therefore /dev/mirror/RAID51p1~n on SH1 and /dev/da0p1~n on SH2.

Both systems have been functioning satisfactorily in production for over 2 months now, and still are.

   However, since the very beginning, with every system boot, SH1's kernel reports the secondary GPT is corrupt of invalid for both da0 & da1. Additionally, for some reason it thinks it has both ar0 and ar1 (the mobo's nVidia RAID1) and that both are degraded. Obviously, there's only ar0. It's the one I installed FreeBSD onto and is the only one in SH1's /etc/fstab.

   Here's an excerpt from the syslogs of both SH1 & SH2 for 1 such incident. To keep the lines shorter, I've removed the timestamp, host and source, and kept the latter 2 as headers. I've also attached a screenshot of the text for better readability.

sh1 kernel: 
------------
ad8: 238475MB <Seagate ST3250310NS SN05> at ata4-master SATA150
ad10: 238474MB <Seagate ST3250310NS SN05> at ata5-master SATA150
da0 at twa0 bus 0 target 0 lun 0
da0: <AMCC 9550SXU-8L DISK 3.08> Fixed Direct Access SCSI-5 device
da0: 100.000MB/s transfers
da0: 3814656MB (7812415488 512 byte sectors: 255H 63S/T 486300C)
da1 at twa1 bus 0 target 0 lun 0
da1: <AMCC 9550SXU-8L DISK 3.08> Fixed Direct Access SCSI-5 device
da1: 100.000MB/s transfers
da1: 3814656MB (7812415488 512 byte sectors: 255H 63S/T 486300C)
ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode
ar0: 238475MB <nVidia MediaShield RAID1> status: DEGRADED
ar0: disk0 READY (master) using ad8 at ata4-master
ar0: disk1 DOWN no device found for this subdisk
ar1: WARNING - mirror protection lost. RAID1 array in DEGRADED mode
ar1: 238474MB <nVidia MediaShield RAID1> status: DEGRADED
ar1: disk0 DOWN no device found for this subdisk
ar1: disk1 READY (mirror) using ad10 at ata5-master
GEOM: da0: the secondary GPT table is corrupt or invalid.
GEOM: da0: using the primary only -- recovery suggested.
GEOM: da1: the secondary GPT table is corrupt or invalid.
GEOM: da1: using the primary only -- recovery suggested.
GEOM_MIRROR: Device mirror/RAID51 launched (2/2).


sh2 kernel: 
------------
ad6: 238475MB <Seagate ST3250310NS SN05> at ata3-master SATA150
ad10: 238475MB <Seagate ST3250310NS SN05> at ata5-master SATA150
da0 at hptrr0 bus 0 target 0 lun 0
da0: <HPT DISK 0_0 4.00> Fixed Direct Access SCSI-0 device
ar0: 238475MB <nVidia MediaShield RAID1> status: READY
ar0: disk0 READY (master) using ad6 at ata3-master
ar0: disk1 READY (mirror) using ad10 at ata5-master

===
Note how SH2 is free of either manifestation.

Now the questions. They all concern SH1:
 1
--
I didn't partition either da0 or da1. I only did /dev/mirror/RAID51. Why the messages about corrupt or invalid 2ries? What can I do to make those messages go away?

 2
--
Why does FreeBSD think it has 2 RAID1's ar0 & ar1, and that both are degraded. What can I do about it?

 3
--
   Although I don't need to follow the kernel's "suggestion" of recovery, suppose I actually needed to, on a different system, how can I go about that, and now? Although there was a lot on this topic under my previous thread "OCE and GPT", what can a layman, like myself, who's not willing to read hundreds of pages of specs, do? 
   To that end, I've already done some quick probing using dd & hd on my own 'healthy' system subject of the thread "OCE & GPT" and found that I can directly copy the 32-sector table from 1ry to 2ry and vice versa because they were identical as evidenced by cmp & hd. However, I found the headers to be different. Is this normal? If so, then I'd appreciate a quick pointer to how the 2 headers are constructed, to save some precious time.

--
Hatem Kawashti