Date: Thu, 22 Dec 2005 08:47:54 -0700 From: Scott Long <scottl@samsco.org> To: Ken Gunderson <kgunders@teamcool.net> Cc: freebsd-amd64@freebsd.org Subject: Re: Tyan TA26, LSI 320-1, and FBSD6.0 Strangeness Message-ID: <43AACAAA.40501@samsco.org> In-Reply-To: <20051222020637.6d099d40.kgunders@teamcool.net> References: <20051220160752.0f6dcc43.kgunders@teamcool.net> <20051220231018.5b383a39.kgunders@teamcool.net> <20051221173311.1fb1670b.kgunders@teamcool.net> <43AA2BFB.3050408@samsco.org> <20051222020637.6d099d40.kgunders@teamcool.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Ken Gunderson wrote: > On Wed, 21 Dec 2005 21:30:51 -0700 > Scott Long <scottl@samsco.org> wrote: > > >>Ken Gunderson wrote: >> >> >>>On Tue, 20 Dec 2005 23:10:18 -0700 >>>Ken Gunderson <kgunders@teamcool.net> wrote: >>> >>> >>> >>>>On Tue, 20 Dec 2005 16:07:52 -0700 >>>>Ken Gunderson <kgunders@teamcool.net> wrote: >>>> >>>> >>>> >>>>>Hello List: >>>>> >>>>>I'm having a tough time w/a Tyan TA26, 320-1 and 6.0-RELEASE that >>>>>I'm hoping y'all may be able to shed some light on. I create logical >>>>>drives and install FBSD just fine. Then cvsup, buildworld, >>>>>buildkernel, installkernel. Upon reboot the system drives (mirror) are >>>>>in degraded mode and the raid0 drive (swap) is offline. MegaRAID is >>>>>unable to rebuild the arrays. I've called LSI support and they're >>>>>mystified as well. >>>> >>>>[big snippage] >>>> >>>> >>>> >>>>>E) Present Status: >>>>> >>>>>Interestingly enough, I am able to FORCE Physical Drive 1 back online >>>>>and then "Check Consistency". Presently 21% complete so don't know if >>>>>it will choke on error on not yet. >>>> >>>>Update- >>>> >>>>The consistency check did complete w/o any errors and rebooting all >>>>logical drives are once again in "Optimal" state. For sake of >>>>completeness heres the dmesg: >>> >>> >>>[more snippage] >>> >>>Yet another follow up on my own post... >>> >>>Update Redoux: >>> >>>1) Using the amr driver from 7-CURRENT yields same results. >>> >>>2) Did some testing playing musical hard drive slots. IF I do NOT >>>use slot 1 (# on Tyan Backplane starts w/1) and use the EXACT same raid >>>config for the mirror usings, e.g. slots 2 & 3, then all works as >>>normally expected. >>> >>>So it would seem that Tyan and/or LSI have something Foobarred? Or >>>that for some reason FBSD is overwriting directly to disk on slot 1 >>>(i.e. da0) even though it's not technically there? >>> >>>Bizarre hardware issues. My raison d'etre... >>> >> >>There is no way for FreeBSD to directly access disks attached to the >>RAID controller. All reads and writes to the array are bounded by the >>controller, and there simply is no way to get around that. With a >>certain amount of advanced hacking it would be possible to corrupt the >>disks with the amr_cam module, but even that is disabled with 7-CURRENT. >>What I'd actually suspect is that the backplane and/or slot connector is >>bad, so bad that simple parity detection cannot catch it. > > > Well, I told y'all it was BIZARRE .... > > The blackplane and/or connector issue was the conclusion last time > around. So that machine was RMA'd by Tyan. The replacement was > reportedly double checked by Tyan tech prior to being shipped. Now I'm > seeing same with 2nd machine. And to make matters even more > interesting... I've subsequently confirmed on yet a 3rd. > > I've done some additional testing w/7-CURRENT amr driver w/one of the > mirrored hd's back in slot #1. If I just grab amr from cvs and > build an SMP kernel I can boot into the new kernel just fine. > > If I then buildworld and reboot w/o proceeding any further then I get > degraded arrary that I can't rebuild, e.g: > > $ dmesg |grep amr > amr0: <LSILogic MegaRAID 1.53> mem 0xff4f0000-0xff4fffff irq 29 at > device 4.0 on pci1 amr0: delete logical drives supported by controller > amr0: <LSILogic MegaRAID SCSI 320-1> Firmware 1L37, BIOS G119, 64MB RAM > amr0: delete logical drives supported by controller > amrd0: <LSILogic MegaRAID logical drive> on amr0 > amrd0: 66036MB (135241728 sectors) RAID 1 (degraded) > amrd1: <LSILogic MegaRAID logical drive> on amr0 > amrd1: 8198MB (16789504 sectors) RAID 0 (offline) > amrd2: <LSILogic MegaRAID logical drive> on amr0 > amrd2: 140270MB (287272960 sectors) RAID 5 (optimal) > amrd1: I/O error - 0x1 > Trying to mount root from ufs:/dev/amrd0s1a > > So this would indicate there _might_ be something amis w/the amr driver > that only pops up under a bit of I/O load, e.g. buildworld. But if this > were the case then why would it only show up when using Slot 1? > The driver in 7-CURRENT was tested under extreme I/O load for 2 weeks before being committed to the tree. > Other possibility is that there is something just plain broken at the > hardware/ firmware level with either the LSI card or the Tyan unit. I'd > lean more towards the latter since the LSI 320-1 had been on the market > for a long time now and widely deployed. Especially compared to the > Tyan TA-26. So it seems like the odds alone would point more towards > the Tyan. > > The good news is that LSI seems quite interested in further > investigation (wish I could say the same for Tyan). Bad news is that > their lab is undergoing remodeling. Or so I am told. > > >>Some controllers allow you to run scans on individual disks from within >>a controlled environment, like the BIOS. I don't recall if the LSI >>cards have this feature, but if they do then they could almost certainly >>verify this. > > > The 320-1 does not. Or at least not that I've found. Maybe there's > some top secret proceedure somewhere I don't know about... I can only > do consistency checks at logical drive level. > Would it be at all possible to substitute a different controller card, even a plain SCSI one, and hook the backplane up to it? Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43AACAAA.40501>