From owner-freebsd-stable@FreeBSD.ORG Tue Sep 25 00:38:54 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E99A216A41B; Tue, 25 Sep 2007 00:38:54 +0000 (UTC) (envelope-from yds@CoolRat.org) Received: from dppl.com (orion.dppl.net [216.182.10.230]) by mx1.freebsd.org (Postfix) with ESMTP id AB40513C455; Tue, 25 Sep 2007 00:38:54 +0000 (UTC) (envelope-from yds@CoolRat.org) Received: from [192.168.1.72] (c-68-83-224-175.hsd1.nj.comcast.net [68.83.224.175]) (AUTH: PLAIN yds, TLS: TLSv1/SSLv3,256bits,AES256-SHA) by dppl.com with esmtp; Mon, 24 Sep 2007 20:28:51 -0400 id 06432CB3.0000000046F85643.00009B42 Date: Mon, 24 Sep 2007 20:28:50 -0400 From: Yarema To: freebsd-stable@freebsd.org, NYCBUG Talk Message-ID: <866CEC2FB789142D3C0AAFCB@[192.168.1.72]> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: Soren Schmidt Subject: FreeBSD PseudoRAID RAID0 array broken on atapci1: X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Sep 2007 00:38:55 -0000 Hi, I need some help recovering from this. First some back story. Running 6.2-STABLE i386 from Sep 17, 2007. My /home slice is mounted from /dev/ar0s1e where the relevant kernel messages look like so when all is good: atapci1: ata2: on atapci1 ata3: on atapci1 ad4: 381554MB at ata2-master SATA150 ad6: 381554MB at ata3-master SATA150 ar0: 763108MB status: READY ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master Today this server crashed with the following loggeed: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=144888320 ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=143390319 ad4: FAILURE - device detached ar0: FAILURE - RAID0 array broken subdisk4: detached ad4: detached g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5 initiate_write_filepage: already started g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=146831867904, length=16384)]error = 5 g_vfs_done():ar0s1e[WRITE(offset=147024330752, length=16384)]error = 5 initiate_write_filepage: already started g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5 initiate_write_filepage: already started g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5 initiate_write_filepage: already started g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5 initiate_write_filepage: already started g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5 Now the kernel messages read: ar0: FAILURE - RAID0 array broken ar0: 763108MB status: BROKEN ar0: disk0 READY using ad4 at ata2-master ar0: disk1 DOWN no device found for this subdisk ar1: 763108MB status: BROKEN ar1: disk0 DOWN no device found for this subdisk ar1: disk1 READY using ad6 at ata3-master For some reason the second disk in the array shows up as ar1 instead of being part of ar0. I suspect there's gotta be some way to force the two drives to show up as part of the same array by perhaps editing the PseudoRAID metadata on disk without putting any of the UFS2 data in "jeopardy". Any pointers on where to start poking around for the relevant metadata structures on disk or what to search for? I figure if I can dd the metadata off the disks, tweak a field or two and then dd the whole mess back I stand a chance of either hosing the array irrevocably or getting it all back. ;) Or maybe atacontrol could be used to re-create the metadata without destroying the UFS2 on the array? I have a coredump of the kernel from this crash if that helps analyze things any. -- Yarema