Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Jan 2001 08:53:10 -0500 
From:      "Robinson, Kimberly" <kimberly_robinson@adaptec.com>
To:        "Salyzyn, Mark" <mark_salyzyn@adaptec.com>, "'Micah Anderson'" <micah@indymedia.org>, "'Chris Snell'" <chris@bikeworld.com>
Cc:        "'Mike Smith'" <msmith@freebsd.org>, "'noah'" <noah@indymedia.org>, "'freebsd-scsi@freebsd.org'" <freebsd-scsi@freebsd.org>
Subject:   RE: update
Message-ID:  <50DB155AD0CED411988E009027D61DB30FFFD6@otcexc01.otc.adaptec.com>

index | next in thread | raw e-mail

Hello All,

I think that what we are seeing in this instance is an issue with the
firmware (microcode) on the drives.  There is a known issue when these
drives are in a RAID configuration, they frustratingly drop out randomly on
reboots.

I have been told by IBM that all DDYS/DPSS drives manufactured after
December 15th need updated to microcode S96H. I have been that the IBM
DDYS/DPSS development team have a team standing by if S96H microcode fails
to fix spin down issue.  The easiest way to update the drive's microcode is
to connect all drives to another Adaptec controller like a 2940 or 29160 and
create a bootable disk with the ASPI driver to run their update program.

IBM's support info as follows:

IBM TG Technical Support Center
888.426.5214
drive@us.ibm.com

There is nothing else required after the firmware update and no one else has
been reporting problems after updating.

Thanks,

Kimberly Robinson




> -----Original Message-----
> From: Salyzyn, Mark 
> Sent: Friday, January 05, 2001 1:18 PM
> To: 'Micah Anderson'; Chris Snell
> Cc: 'Mike Smith'; noah; freebsd-scsi@freebsd.org; Robinson, Kimberly
> Subject: RE: update
> 
> 
> I feel this *must* be a controller firmware issue. To resolve 
> this, our technical support department is going to need to 
> duplicate this problem so our Firmware engineers can 
> understand why the drives are going offline. It Feels like 
> the combination of Hardware is making the Firmware `brittle' 
> where subtle changes cause the issues to come and go. The 
> controller Firmware contains *all* the smarts associated with 
> the SCSI bus communications.
> 
> Technical support may be able to supply you a different 
> hardware versioned card with debugging (UART 115200 Baud) 
> port installed to capture the needed information to resolve 
> this. IMHO, this is the *best* way to resolve this to 
> fruition. You have effectively swapped enough things around 
> to have isolated away the hardware domain validation possibilities.
> 
> In any case, escalating this to Adaptec Technical support to 
> see if they have any other practical ideas.
> 
> Sincerely -- Mark Salyzyn
> 
> -----Original Message-----
> From: Micah Anderson [mailto:micah@indymedia.org]
> Sent: Friday, January 05, 2001 12:45 PM
> To: Chris Snell
> Cc: Micah Anderson; Salyzyn, Mark; 'Mike Smith'; noah;
> freebsd-scsi@freebsd.org
> Subject: Re: update
> 
> 
> Chris,
> 
> I would be interested in seeing your kernel config so I could 
> comapre it to
> the one I made. When I tried the card in a different machine 
> that machine
> had a totally different motherboard and BIOS. I have been finding that
> Debian does work, but it is sensitive. For example, if I try 
> to boot from
> the RAID I get the same behavior, but if I boot from a 
> separate IDE drive
> and just mount the raid partitions things are fine. I have a 
> feeling that
> perhaps there is a RAID header at the beginning of the 
> logicial volume that
> can be overwritten by a master boot record or a boot loader 
> like Lilo, or
> Grub, or the FreeBSD loader...?
> 
> Micah
> 
> On Thu, 04 Jan 2001, Chris Snell wrote:
> 
> > 
> > Micah,
> > 
> > Would it be of any help if I sent you the kernel config for 
> our server that 
> > has one of these cards in it?  As I said earlier, it's been 
> working great 
> > for us.  Also, when you tried this card in a different 
> machine, did that 
> > machine have the same motherboard and BIOS?  You mentioned 
> that Debian 
> > works on your setup.  Did you try installing it (Debian) 
> and then hammering 
> > on the disks or did you just verify that it installed?
> > 
> > Chris
> > 
> > At 03:25 PM 12/26/2000 -0800, Micah Anderson wrote:
> > >So I have tried pretty much everything, the alarm still 
> goes off at the same
> > >time during boot up, at asr0: major=154. I am trying a 
> last experiment
> > >today, if it doesn't work, I am sad to say that I am going 
> to have to use
> > >Debian since it works fine there. I have had this server 
> for over a month
> > >trying everything on the planet to get it to work, we need 
> this server
> > >running in a bad way and although I want to go with 
> FreeBSD we unfortunately
> > >are going to have to go with what works.
> > >
> > >Right now I am trying to recompile the kernel by pulling 
> everything out of
> > >the config file, except what is needed. It seems as if the 
> problem has to do
> > >with the FreeBSD scsi or asr driver. Because thats when 
> things go, and if I
> > >can boot off the CD without this happening, then something 
> is funky.
> > >
> > >I was called by Ida at Adaptec to follow up on the call 
> that I originally
> > >placed, ID #2843, but I was given the wrong number to call 
> her back.
> > >
> > >I've done practically everything in my power, besides 
> getting a job at
> > >adaptec or delving into the FreeBSD driver code, neither 
> of which I can do
> > >at this point. Do you guys have any other ideas, or 
> suggestions where to go
> > >next?
> > >
> > >Just a reminder, this is an adaptec 3200s, using freebsd 
> 4.2, 4 IBM 9 gig
> > >10,000 RPM LVD drives making up a Raid-5, using a nice 
> Intel motherboard
> > >(which has another adaptec on board controller, but I've 
> tried the card in a
> > >different machine with the drives, same results)....
> > >
> > >Micah
> > >
> > >
> > >
> > >On Mon, 18 Dec 2000, Salyzyn, Mark wrote:
> > >
> > > > Although I figure Adaptec's Tech Support would be the 
> best to know about
> > > > generic issues with drive access, the possibilities for 
> this issue 
> > > could be:
> > > >
> > > > 1) No cable and/or drive cabinet domain validation, so 
> one might have to
> > > > roll the SCSI speed down a bit to compensate for cable 
> and/or drive
> > > > combination issues.
> > > > 2) Some drives are more comfortable with either over 
> (more than just 
> > > the two
> > > > endpoints) or under (only the last drive or controller) 
> termination.
> > > > 3) Contact tech support for a later Firmware release, 
> there may be known
> > > > issues with your drives, cabinets and/or drive 
> combinations that might have
> > > > been addressed with either drive firmware, or 
> controller firmware updates.
> > > > Currently the customer has better access to Technical 
> Support than I do at
> > > > this moment :-( even though I virtually end up driving 
> over top them each
> > > > morning as I head to the parking lot ...
> > > >
> > > > In any case, I will report this to the Firmware 
> engineers to see if they
> > > > have any additional comments to add about this issue.
> > > >
> > > > Keep in mind that at initial negotiation, the speed is 
> lower, the transfers
> > > > less stressful, than at operating system time. Edge 
> issues may surface as a
> > > > result, sometimes appearing different from OS to OS. 
> For instance, I 
> > > believe
> > > > the ASR driver can request up to 58 (~4KB) 
> scatter/gather elements in one
> > > > request, allowing up to 256 requests/device. NT's 
> scsiport driver, on the
> > > > other hand, limits request to 64KB/each and only 16 
> requests/controller.
> > > > Stresses vary.
> > > >
> > > > However, OS issues do not typically affect drive 
> failures, which is 
> > > curious.
> > > > I have an issue that comes up in FreeBSD, for instance, 
> with the array
> > > > performance in an impacted (read failures do not fail 
> an array since data
> > > > can be reconstructed) state since the requests take 
> much longer to fulfill
> > > > than in the genuine failed state. Impacted means every 
> request still tries
> > > > to be fulfilled by first trying to talk to the not-yet 
> failed component.
> > > > This has the catch-22 effect of not being able to mount 
> the array head due
> > > > to the protracted responses on some failed drive 
> scenarios before the
> > > > adapter has considered the component to be marked as 
> failed. Pulling the
> > > > errant drive might be the only way. Later adapter 
> Firmware may deal with
> > > > this through careful consideration of request response 
> time. Tech support
> > > > may supply a select fail-on-read firmware/NVRAM, or one 
> can chose to 
> > > bump up
> > > > the timeout in the SCSI layer. This issue, for 
> instance, does not occur
> > > > under Solaris because their SCSI layer is set to 2 
> minute timeouts.
> > > >
> > > > Sincerely -- Mark Salyzyn
> > > >
> > > > -----Original Message-----
> > > > From: Mike Smith [mailto:msmith@freebsd.org]
> > > > Sent: Monday, December 18, 2000 5:37 AM
> > > > To: Micah Anderson
> > > > Cc: noah; freebsd-scsi@freebsd.org; mark_salyzyn@adaptec.com
> > > > Subject: Re: update
> > > >
> > > >
> > > >
> > > > Mark; I miscopied you on my previous reply to this 
> message, sorry about
> > > > that.  Do you have any ideas?
> > > >
> > > > > On Sat, 16 Dec 2000, Mike Smith wrote:
> > > > >
> > > > > > > At "asr0: major=154" the raid card begins a high 
> pitched beep
> > > > indicating
> > > > > > > that two of the drives have failed and that a 
> rebuild of the raid is
> > > > > > > required, but we've tested all of the drives and 
> replaced the raid
> > > > card
> > > > > > > with a new one, and still get the same problem. 
> The reason I'm asking
> > > > > > > about possible software issues is that other OS's 
> have worked on this
> > > > raid
> > > > > > > setup.
> > > > > >
> > > > > > I've copied Mark at Adaptec, who is the author and 
> principle 
> > > maintainer
> > > > > > of the 'asr' driver, since he's going to have the 
> best idea of what's
> > > > > > actually going on here.  Without saying which OS' 
> you've used, it's
> > > > tough
> > > > > > to know whether they simply aren't enabling the 
> card alarm though.
> > > > >
> > > > > We have gone through exhaustive troubleshooting 
> lengths to try to
> > > > determine
> > > > > what the problem is. I have swapped RAID cards, 
> swapped cables, tried a
> > > > > different motherboard, different powersupply in every 
> possible 
> > > combination
> > > > > of configuration. Each time I have to start from the 
> beginning, 
> > > destroying
> > > > > the RAID configuration and then creating a new one, 
> which takes over an
> > > > > hour, so this process has taken literally three weeks 
> to try all the
> > > > > potential configurations.
> > > > >
> > > > > The RAID alarm goes off on the card during the 
> FreeBSD boot process, the
> > > > OS
> > > > > continues to boot, but the alarm continues. Rebooting 
> and going into the
> > > > > Adaptec setup tells us that a drive has failed, it is 
> not the same drive
> > > > > every time. During bootup after the RAID POST when 
> the SMOR utility is
> > > > > loading it will usually show the RAID-5 drive as well 
> as the single 
> > > drive.
> > > > > It is almost as if one of the drives of the RAID is 
> pushed out of the
> > > > RAID.
> > > > > Individually, each drive works fine. If I install 
> FreeBSD on a single
> > > > drive,
> > > > > without a RAID constructed things act as normal.  
> These are IBM 10k RPM
> > > > LVD
> > > > > drives and I ran IBM's drive test utility on each one 
> of them and it came
> > > > > back with no errors.
> > > > >
> > > > > I have been able to install Debian Linux and use the 
> card/drives without
> > > > > this problem. I have called Adaptec to ask them about 
> this and was 
> > > told to
> > > > > try changing the drive speed from Ultra 3 to Ultra as 
> well as change the
> > > > > delay from the default to 30 seconds, all of these do 
> not change the
> > > > > behavior whatsoever.
> > > > >
> > > > > I have spoken with one other person who had a similar 
> type of problem,
> > > > > except what was happening to him was he was loading 
> some DOS drivers, one
> > > > of
> > > > > which would wipe the RAID card configuration when it 
> was loaded (ASAPI? I
> > > > > can't recall right now)... I am wondering if there 
> are some other drivers
> > > > > that are being probed in the generic FreeBSD kernel 
> that are doing a
> > > > similar
> > > > > thing to the config.
> > > > >
> > > > > >
> > > > > > Have you tried running the Adaptec management 
> software to check the
> > > > > > status of the card?
> > > > >
> > > > > In FreeBSD? If there is such a thing it would be 
> interesting to know 
> > > where
> > > > > one could get it. The CD that was included with the 
> card has no FreeBSD
> > > > > anything on it - the website has no FreeBSD 
> information or downloads 
> > > on it
> > > > > (except for the breif mention that it is supported, 
> but if you call for
> > > > > support you can't get it). Or are you talking about 
> the SMOR utility that
> > > > > you can access from the BIOS?
> > > > >
> > > > > Thanks for any help that you can offer.
> > > > >
> > > > > Micah
> > > > >
> > > > >
> > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > On 12/15, Mike Smith wrote:
> > > > > > > >
> > > > > > > > > Hi, I'm working on trying to install FreeBSD 
> 4.2 on a dual p3 700
> > > > with
> > > > > > > > > an Adaptec 3200S raid card. From what I can 
> tell everyone 
> > > that has
> > > > tried
> > > > > > > > > this card has had good luck. When we install 
> FreeBSD (booting off
> > > > cd) it
> > > > > > > > > recognizes the card and installs on it 
> perfectly, but when it
> > > > loads the OS
> > > > > > > > > off the raid it does something to damage the 
> hardware raid,
> > > > requiring us
> > > > > > > > > to rebuild the RAID in the 3200S' bios. We're 
> pretty sure that
> > > > this isn't
> > > > > > > > > a hardware problem.
> > > > > > > >
> > > > > > > > You haven't actually included anything that 
> suggests that 
> > > there's a
> > > > > > > > problem occurring, so it's somewhat difficult 
> to guess what's going
> > > > on.
> > > > > > > >
> > > > > > > > However, I don't lend much credibility to the 
> suggestion that
> > > > "FreeBSD
> > > > > > > > does something to damage the hadware raid" - 
> things just don't
> > > > happen
> > > > > > > > like that.
> > > > > > > >
> > > > > > > > I would be inclined to suspect that you 
> probably have a suspect
> > > > disk, or
> > > > > > > > cabling/enclosure problems, but without more 
> details it's hard 
> > > to be
> > > > sure.
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > ... every activity meets with opposition, 
> everyone who acts has his
> > > > > > > > rivals and unfortunately opponents also.  But 
> not because people
> > > > want
> > > > > > > > to be opponents, rather because the tasks and 
> relationships force
> > > > > > > > people to take different points of view.  [Dr. 
> Fritz Todt]
> > > > > > > >            V I C T O R Y   N O T   V E N G E A N C E
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > noah .. email for pgp/gpg key
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > ... every activity meets with opposition, everyone 
> who acts has his
> > > > > > rivals and unfortunately opponents also.  But not 
> because people want
> > > > > > to be opponents, rather because the tasks and 
> relationships force
> > > > > > people to take different points of view.  [Dr. Fritz Todt]
> > > > > >            V I C T O R Y   N O T   V E N G E A N C E
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > ... every activity meets with opposition, everyone who 
> acts has his
> > > > rivals and unfortunately opponents also.  But not 
> because people want
> > > > to be opponents, rather because the tasks and 
> relationships force
> > > > people to take different points of view.  [Dr. Fritz Todt]
> > > >            V I C T O R Y   N O T   V E N G E A N C E
> > > >
> > >
> > >
> > >To Unsubscribe: send mail to majordomo@FreeBSD.org
> > >with "unsubscribe freebsd-scsi" in the body of the message
> > 
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DB155AD0CED411988E009027D61DB30FFFD6>