From owner-freebsd-amd64@FreeBSD.ORG Thu Dec 22 09:06:41 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3369016A41F for ; Thu, 22 Dec 2005 09:06:41 +0000 (GMT) (envelope-from kgunders@teamcool.net) Received: from koyukuk.teamcool.net (koyukuk.teamcool.net [209.161.34.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7358443D53 for ; Thu, 22 Dec 2005 09:06:40 +0000 (GMT) (envelope-from kgunders@teamcool.net) Received: from koyukuk.teamcool.net (localhost [127.0.0.1]) by koyukuk.teamcool.net (TeamCool Rocks) with ESMTP id 189E6F7CE; Thu, 22 Dec 2005 02:06:39 -0700 (MST) Received: from cochise.teamcool.net (unknown [192.168.1.57]) by koyukuk.teamcool.net (TeamCool Rocks) with ESMTP id BBB9FF7C5; Thu, 22 Dec 2005 02:06:38 -0700 (MST) Date: Thu, 22 Dec 2005 02:06:37 -0700 From: Ken Gunderson To: Scott Long Message-Id: <20051222020637.6d099d40.kgunders@teamcool.net> In-Reply-To: <43AA2BFB.3050408@samsco.org> References: <20051220160752.0f6dcc43.kgunders@teamcool.net> <20051220231018.5b383a39.kgunders@teamcool.net> <20051221173311.1fb1670b.kgunders@teamcool.net> <43AA2BFB.3050408@samsco.org> Organization: Teamcool Networks X-Mailer: Sylpheed version 1.9.12 (GTK+ 2.6.7; i386-portbld-freebsd5.4) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Cc: freebsd-amd64@freebsd.org Subject: Re: Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2005 09:06:41 -0000 On Wed, 21 Dec 2005 21:30:51 -0700 Scott Long wrote: > Ken Gunderson wrote: > > > On Tue, 20 Dec 2005 23:10:18 -0700 > > Ken Gunderson wrote: > > > > > >>On Tue, 20 Dec 2005 16:07:52 -0700 > >>Ken Gunderson wrote: > >> > >> > >>>Hello List: > >>> > >>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that > >>>I'm hoping y'all may be able to shed some light on. I create logical > >>>drives and install FBSD just fine. Then cvsup, buildworld, > >>>buildkernel, installkernel. Upon reboot the system drives (mirror) are > >>>in degraded mode and the raid0 drive (swap) is offline. MegaRAID is > >>>unable to rebuild the arrays. I've called LSI support and they're > >>>mystified as well. > >> > >>[big snippage] > >> > >> > >>>E) Present Status: > >>> > >>>Interestingly enough, I am able to FORCE Physical Drive 1 back online > >>>and then "Check Consistency". Presently 21% complete so don't know if > >>>it will choke on error on not yet. > >> > >>Update- > >> > >>The consistency check did complete w/o any errors and rebooting all > >>logical drives are once again in "Optimal" state. For sake of > >>completeness heres the dmesg: > > > > > > [more snippage] > > > > Yet another follow up on my own post... > > > > Update Redoux: > > > > 1) Using the amr driver from 7-CURRENT yields same results. > > > > 2) Did some testing playing musical hard drive slots. IF I do NOT > > use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid > > config for the mirror usings, e.g. slots 2 & 3, then all works as > > normally expected. > > > > So it would seem that Tyan and/or LSI have something Foobarred? Or > > that for some reason FBSD is overwriting directly to disk on slot 1 > > (i.e. da0) even though it's not technically there? > > > > Bizarre hardware issues. My raison d'etre... > > > > There is no way for FreeBSD to directly access disks attached to the > RAID controller. All reads and writes to the array are bounded by the > controller, and there simply is no way to get around that. With a > certain amount of advanced hacking it would be possible to corrupt the > disks with the amr_cam module, but even that is disabled with 7-CURRENT. > What I'd actually suspect is that the backplane and/or slot connector is > bad, so bad that simple parity detection cannot catch it. Well, I told y'all it was BIZARRE .... The blackplane and/or connector issue was the conclusion last time around. So that machine was RMA'd by Tyan. The replacement was reportedly double checked by Tyan tech prior to being shipped. Now I'm seeing same with 2nd machine. And to make matters even more interesting... I've subsequently confirmed on yet a 3rd. I've done some additional testing w/7-CURRENT amr driver w/one of the mirrored hd's back in slot #1. If I just grab amr from cvs and build an SMP kernel I can boot into the new kernel just fine. If I then buildworld and reboot w/o proceeding any further then I get degraded arrary that I can't rebuild, e.g: $ dmesg |grep amr amr0: mem 0xff4f0000-0xff4fffff irq 29 at device 4.0 on pci1 amr0: delete logical drives supported by controller amr0: Firmware 1L37, BIOS G119, 64MB RAM amr0: delete logical drives supported by controller amrd0: on amr0 amrd0: 66036MB (135241728 sectors) RAID 1 (degraded) amrd1: on amr0 amrd1: 8198MB (16789504 sectors) RAID 0 (offline) amrd2: on amr0 amrd2: 140270MB (287272960 sectors) RAID 5 (optimal) amrd1: I/O error - 0x1 Trying to mount root from ufs:/dev/amrd0s1a So this would indicate there _might_ be something amis w/the amr driver that only pops up under a bit of I/O load, e.g. buildworld. But if this were the case then why would it only show up when using Slot 1? Other possibility is that there is something just plain broken at the hardware/ firmware level with either the LSI card or the Tyan unit. I'd lean more towards the latter since the LSI 320-1 had been on the market for a long time now and widely deployed. Especially compared to the Tyan TA-26. So it seems like the odds alone would point more towards the Tyan. The good news is that LSI seems quite interested in further investigation (wish I could say the same for Tyan). Bad news is that their lab is undergoing remodeling. Or so I am told. > Some controllers allow you to run scans on individual disks from within > a controlled environment, like the BIOS. I don't recall if the LSI > cards have this feature, but if they do then they could almost certainly > verify this. The 320-1 does not. Or at least not that I've found. Maybe there's some top secret proceedure somewhere I don't know about... I can only do consistency checks at logical drive level. -- Best regards, Ken Gunderson Q: Because it reverses the logical flow of conversation. A: Why is putting a reply at the top of the message frowned upon?