From owner-freebsd-amd64@FreeBSD.ORG Thu Dec 22 15:48:03 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E6CC216A41F for ; Thu, 22 Dec 2005 15:48:03 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6A26D43D58 for ; Thu, 22 Dec 2005 15:47:58 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id jBMFlsWE069009; Thu, 22 Dec 2005 08:47:57 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <43AACAAA.40501@samsco.org> Date: Thu, 22 Dec 2005 08:47:54 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050615 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ken Gunderson References: <20051220160752.0f6dcc43.kgunders@teamcool.net> <20051220231018.5b383a39.kgunders@teamcool.net> <20051221173311.1fb1670b.kgunders@teamcool.net> <43AA2BFB.3050408@samsco.org> <20051222020637.6d099d40.kgunders@teamcool.net> In-Reply-To: <20051222020637.6d099d40.kgunders@teamcool.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on pooker.samsco.org Cc: freebsd-amd64@freebsd.org Subject: Re: Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2005 15:48:04 -0000 Ken Gunderson wrote: > On Wed, 21 Dec 2005 21:30:51 -0700 > Scott Long wrote: > > >>Ken Gunderson wrote: >> >> >>>On Tue, 20 Dec 2005 23:10:18 -0700 >>>Ken Gunderson wrote: >>> >>> >>> >>>>On Tue, 20 Dec 2005 16:07:52 -0700 >>>>Ken Gunderson wrote: >>>> >>>> >>>> >>>>>Hello List: >>>>> >>>>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that >>>>>I'm hoping y'all may be able to shed some light on. I create logical >>>>>drives and install FBSD just fine. Then cvsup, buildworld, >>>>>buildkernel, installkernel. Upon reboot the system drives (mirror) are >>>>>in degraded mode and the raid0 drive (swap) is offline. MegaRAID is >>>>>unable to rebuild the arrays. I've called LSI support and they're >>>>>mystified as well. >>>> >>>>[big snippage] >>>> >>>> >>>> >>>>>E) Present Status: >>>>> >>>>>Interestingly enough, I am able to FORCE Physical Drive 1 back online >>>>>and then "Check Consistency". Presently 21% complete so don't know if >>>>>it will choke on error on not yet. >>>> >>>>Update- >>>> >>>>The consistency check did complete w/o any errors and rebooting all >>>>logical drives are once again in "Optimal" state. For sake of >>>>completeness heres the dmesg: >>> >>> >>>[more snippage] >>> >>>Yet another follow up on my own post... >>> >>>Update Redoux: >>> >>>1) Using the amr driver from 7-CURRENT yields same results. >>> >>>2) Did some testing playing musical hard drive slots. IF I do NOT >>>use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid >>>config for the mirror usings, e.g. slots 2 & 3, then all works as >>>normally expected. >>> >>>So it would seem that Tyan and/or LSI have something Foobarred? Or >>>that for some reason FBSD is overwriting directly to disk on slot 1 >>>(i.e. da0) even though it's not technically there? >>> >>>Bizarre hardware issues. My raison d'etre... >>> >> >>There is no way for FreeBSD to directly access disks attached to the >>RAID controller. All reads and writes to the array are bounded by the >>controller, and there simply is no way to get around that. With a >>certain amount of advanced hacking it would be possible to corrupt the >>disks with the amr_cam module, but even that is disabled with 7-CURRENT. >>What I'd actually suspect is that the backplane and/or slot connector is >>bad, so bad that simple parity detection cannot catch it. > > > Well, I told y'all it was BIZARRE .... > > The blackplane and/or connector issue was the conclusion last time > around. So that machine was RMA'd by Tyan. The replacement was > reportedly double checked by Tyan tech prior to being shipped. Now I'm > seeing same with 2nd machine. And to make matters even more > interesting... I've subsequently confirmed on yet a 3rd. > > I've done some additional testing w/7-CURRENT amr driver w/one of the > mirrored hd's back in slot #1. If I just grab amr from cvs and > build an SMP kernel I can boot into the new kernel just fine. > > If I then buildworld and reboot w/o proceeding any further then I get > degraded arrary that I can't rebuild, e.g: > > $ dmesg |grep amr > amr0: mem 0xff4f0000-0xff4fffff irq 29 at > device 4.0 on pci1 amr0: delete logical drives supported by controller > amr0: Firmware 1L37, BIOS G119, 64MB RAM > amr0: delete logical drives supported by controller > amrd0: on amr0 > amrd0: 66036MB (135241728 sectors) RAID 1 (degraded) > amrd1: on amr0 > amrd1: 8198MB (16789504 sectors) RAID 0 (offline) > amrd2: on amr0 > amrd2: 140270MB (287272960 sectors) RAID 5 (optimal) > amrd1: I/O error - 0x1 > Trying to mount root from ufs:/dev/amrd0s1a > > So this would indicate there _might_ be something amis w/the amr driver > that only pops up under a bit of I/O load, e.g. buildworld. But if this > were the case then why would it only show up when using Slot 1? > The driver in 7-CURRENT was tested under extreme I/O load for 2 weeks before being committed to the tree. > Other possibility is that there is something just plain broken at the > hardware/ firmware level with either the LSI card or the Tyan unit. I'd > lean more towards the latter since the LSI 320-1 had been on the market > for a long time now and widely deployed. Especially compared to the > Tyan TA-26. So it seems like the odds alone would point more towards > the Tyan. > > The good news is that LSI seems quite interested in further > investigation (wish I could say the same for Tyan). Bad news is that > their lab is undergoing remodeling. Or so I am told. > > >>Some controllers allow you to run scans on individual disks from within >>a controlled environment, like the BIOS. I don't recall if the LSI >>cards have this feature, but if they do then they could almost certainly >>verify this. > > > The 320-1 does not. Or at least not that I've found. Maybe there's > some top secret proceedure somewhere I don't know about... I can only > do consistency checks at logical drive level. > Would it be at all possible to substitute a different controller card, even a plain SCSI one, and hook the backplane up to it? Scott