From owner-freebsd-geom@FreeBSD.ORG Thu Sep 25 05:20:09 2008 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F74A1065686 for ; Thu, 25 Sep 2008 05:20:09 +0000 (UTC) (envelope-from joshua.dunham@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.239]) by mx1.freebsd.org (Postfix) with ESMTP id 258C88FC1F for ; Thu, 25 Sep 2008 05:20:09 +0000 (UTC) (envelope-from joshua.dunham@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so242070rvf.43 for ; Wed, 24 Sep 2008 22:20:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=k4F1crNOfz10SLYEs8mJwdXAzlOxfzFwqQ/LXdLnESU=; b=hCqaoYe9M8/4XqAV6+L01PwOJGoALsH8l0k2mWC9W5BP/TXzcj7HzyIfcPf6tM8IXF JZNjL7cxuMsOJd4bOWY4F5Z8gSwpcd/s09oGHANIrEuF5hja4QhIznCH4nTDraHU0fNy TaZSmjqmpcMGizS6iD3cKRO3GUXiBgP0Chxp0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=gy+5RMLZe9JvMlU/NM8pHDXXkw1aSUrdpczGRXaQdleZXndrn350HgMCN3gY+hhJMF TXzpmQW62xxbdg5QrftSd/rgYthYg0vJvceEFeWEwnThQaxoG2sGkIW3njz45T1Yipyt Ts4DvoLhckJP6Qk/ExcFJr8pMMpc7t3ME8zyA= Received: by 10.140.249.20 with SMTP id w20mr3900713rvh.189.1222318458463; Wed, 24 Sep 2008 21:54:18 -0700 (PDT) Received: by 10.140.202.4 with HTTP; Wed, 24 Sep 2008 21:54:18 -0700 (PDT) Message-ID: Date: Thu, 25 Sep 2008 00:54:18 -0400 From: "Joshua Dunham" To: freebsd-geom@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: After crash geom::graid5 may not resync correctly. Geli complains missing metadata X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Sep 2008 05:20:09 -0000 Hi Guys, So, I'm a graid5 / geli newbie and having some trouble. Here's the scoop. Last week I had a perfectly running system which included 6x 500GB WD sata drives in a geom::graid5 array with geli managing the encryption layer. 2x sata controllers that are the same as when system was working 100% so I assume there should be no conflict now. My transfer rates started crawling so I popped into ssh and immediately read that one of the discs was throwing UDMA xfer errors. Well I rebooted and changed the BIOS settings for the drive and tried to boot back into freebsd. The system then started freezing after a few seconds of starting a resync. I called the drive junk, shut down the system, and ordered a spare. Once it came I replaced the drive and booted into freebsd, formatted the disc using the raid filesystem and rebooted. The system recognized the new drive as one that was out of sync and immediately started to re-sync it. After it was done resynchronizing I went to attach the encryption layer and here is where the trouble started. I get an error message about missing metadata. NAS:~# geli attach /dev/raid5/storage Cannot read metadata from /dev/raid5/storage: Invalid argument. In troubleshooting this problem I have learned about the geli dump command, but it's probably too late now. I also heard from various sources that it is very bad to swap the sata cables on a raid system. I don't think I swapped any cables, but the BIOS has a section where I can define the HD's of the system for booting etc. When you remove a drive the one below it (ad4 let's say) will shift up (to ad6 lets say). This has probably happened, but does it really kill the array? I'd really assume since it's raid5 the metadata for graid5 would keep the discs in the correct order as it stores the Disk number in the metadata. Also, since geli is on a raid5 I'd also assume that the geli metadata would not go missing if the raid5 rebuilds correctly. Please Help! Here are some stats about the geom::graid5 array if it's helpful. NAS:~# graid5 list Geom name: storage State: COMPLETE CALM Status: Total=6, Online=6 Type: AUTOMATIC Pending: (wqp 0 // 0) Stripesize: 131072 MemUse: 0 (msl 0) Newest: -1 ID: 68917578 Providers: 1. Name: raid5/storage Mediasize: 2500539187200 (2.3T) Sectorsize: 512 Mode: r0w0e0 Consumers: 1. Name: ad20 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 4 Error: No 2. Name: ad18 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 1 Error: No 3. Name: ad16 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 0 Error: No 4. Name: ad14 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 5 Error: No 5. Name: ad12 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 3 Error: No 6. Name: ad10 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 DiskNo: 2 Error: No NAS:~# graid5 dump ad10 Metadata on ad10: Magic string: GEOM::RAID5 Metadata version: 2 Device name: storage Device ID: 68917578 Disk number: 2 Total number of disks: 6 Provider Size: 500107862016 Verified: -1 State: 0 Stripe size: 131072 Newest: 4294967295 NoHot: No Hardcoded provider: ## graid5 dump adXX output looks exactly the same besides the 'Disk number:' output for all devices. Any advice you guys can give to rescue the data would be soooo appreciated, you have no idea. -Joshua