Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jan 2001 13:18:21 -0500 
From:      "Salyzyn, Mark" <mark_salyzyn@adaptec.com>
To:        "'Micah Anderson'" <micah@indymedia.org>, Chris Snell <chris@bikeworld.com>
Cc:        "'Mike Smith'" <msmith@freebsd.org>, noah <noah@indymedia.org>, freebsd-scsi@freebsd.org, "Robinson, Kimberly" <kimberly_robinson@adaptec.com>
Subject:   RE: update
Message-ID:  <50DB155AD0CED411988E009027D61DB31D816D@otcexc01.otc.adaptec.com>

next in thread | raw e-mail | index | archive | help

I feel this *must* be a controller firmware issue. To resolve this, our
technical support department is going to need to duplicate this problem so
our Firmware engineers can understand why the drives are going offline. It
Feels like the combination of Hardware is making the Firmware `brittle'
where subtle changes cause the issues to come and go. The controller
Firmware contains *all* the smarts associated with the SCSI bus
communications.

Technical support may be able to supply you a different hardware versioned
card with debugging (UART 115200 Baud) port installed to capture the needed
information to resolve this. IMHO, this is the *best* way to resolve this to
fruition. You have effectively swapped enough things around to have isolated
away the hardware domain validation possibilities.

In any case, escalating this to Adaptec Technical support to see if they
have any other practical ideas.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: Micah Anderson [mailto:micah@indymedia.org]
Sent: Friday, January 05, 2001 12:45 PM
To: Chris Snell
Cc: Micah Anderson; Salyzyn, Mark; 'Mike Smith'; noah;
freebsd-scsi@freebsd.org
Subject: Re: update


Chris,

I would be interested in seeing your kernel config so I could comapre it to
the one I made. When I tried the card in a different machine that machine
had a totally different motherboard and BIOS. I have been finding that
Debian does work, but it is sensitive. For example, if I try to boot from
the RAID I get the same behavior, but if I boot from a separate IDE drive
and just mount the raid partitions things are fine. I have a feeling that
perhaps there is a RAID header at the beginning of the logicial volume that
can be overwritten by a master boot record or a boot loader like Lilo, or
Grub, or the FreeBSD loader...?

Micah

On Thu, 04 Jan 2001, Chris Snell wrote:

> 
> Micah,
> 
> Would it be of any help if I sent you the kernel config for our server
that 
> has one of these cards in it?  As I said earlier, it's been working great 
> for us.  Also, when you tried this card in a different machine, did that 
> machine have the same motherboard and BIOS?  You mentioned that Debian 
> works on your setup.  Did you try installing it (Debian) and then
hammering 
> on the disks or did you just verify that it installed?
> 
> Chris
> 
> At 03:25 PM 12/26/2000 -0800, Micah Anderson wrote:
> >So I have tried pretty much everything, the alarm still goes off at the
same
> >time during boot up, at asr0: major=154. I am trying a last experiment
> >today, if it doesn't work, I am sad to say that I am going to have to use
> >Debian since it works fine there. I have had this server for over a month
> >trying everything on the planet to get it to work, we need this server
> >running in a bad way and although I want to go with FreeBSD we
unfortunately
> >are going to have to go with what works.
> >
> >Right now I am trying to recompile the kernel by pulling everything out
of
> >the config file, except what is needed. It seems as if the problem has to
do
> >with the FreeBSD scsi or asr driver. Because thats when things go, and if
I
> >can boot off the CD without this happening, then something is funky.
> >
> >I was called by Ida at Adaptec to follow up on the call that I originally
> >placed, ID #2843, but I was given the wrong number to call her back.
> >
> >I've done practically everything in my power, besides getting a job at
> >adaptec or delving into the FreeBSD driver code, neither of which I can
do
> >at this point. Do you guys have any other ideas, or suggestions where to
go
> >next?
> >
> >Just a reminder, this is an adaptec 3200s, using freebsd 4.2, 4 IBM 9 gig
> >10,000 RPM LVD drives making up a Raid-5, using a nice Intel motherboard
> >(which has another adaptec on board controller, but I've tried the card
in a
> >different machine with the drives, same results)....
> >
> >Micah
> >
> >
> >
> >On Mon, 18 Dec 2000, Salyzyn, Mark wrote:
> >
> > > Although I figure Adaptec's Tech Support would be the best to know
about
> > > generic issues with drive access, the possibilities for this issue 
> > could be:
> > >
> > > 1) No cable and/or drive cabinet domain validation, so one might have
to
> > > roll the SCSI speed down a bit to compensate for cable and/or drive
> > > combination issues.
> > > 2) Some drives are more comfortable with either over (more than just 
> > the two
> > > endpoints) or under (only the last drive or controller) termination.
> > > 3) Contact tech support for a later Firmware release, there may be
known
> > > issues with your drives, cabinets and/or drive combinations that might
have
> > > been addressed with either drive firmware, or controller firmware
updates.
> > > Currently the customer has better access to Technical Support than I
do at
> > > this moment :-( even though I virtually end up driving over top them
each
> > > morning as I head to the parking lot ...
> > >
> > > In any case, I will report this to the Firmware engineers to see if
they
> > > have any additional comments to add about this issue.
> > >
> > > Keep in mind that at initial negotiation, the speed is lower, the
transfers
> > > less stressful, than at operating system time. Edge issues may surface
as a
> > > result, sometimes appearing different from OS to OS. For instance, I 
> > believe
> > > the ASR driver can request up to 58 (~4KB) scatter/gather elements in
one
> > > request, allowing up to 256 requests/device. NT's scsiport driver, on
the
> > > other hand, limits request to 64KB/each and only 16
requests/controller.
> > > Stresses vary.
> > >
> > > However, OS issues do not typically affect drive failures, which is 
> > curious.
> > > I have an issue that comes up in FreeBSD, for instance, with the array
> > > performance in an impacted (read failures do not fail an array since
data
> > > can be reconstructed) state since the requests take much longer to
fulfill
> > > than in the genuine failed state. Impacted means every request still
tries
> > > to be fulfilled by first trying to talk to the not-yet failed
component.
> > > This has the catch-22 effect of not being able to mount the array head
due
> > > to the protracted responses on some failed drive scenarios before the
> > > adapter has considered the component to be marked as failed. Pulling
the
> > > errant drive might be the only way. Later adapter Firmware may deal
with
> > > this through careful consideration of request response time. Tech
support
> > > may supply a select fail-on-read firmware/NVRAM, or one can chose to 
> > bump up
> > > the timeout in the SCSI layer. This issue, for instance, does not
occur
> > > under Solaris because their SCSI layer is set to 2 minute timeouts.
> > >
> > > Sincerely -- Mark Salyzyn
> > >
> > > -----Original Message-----
> > > From: Mike Smith [mailto:msmith@freebsd.org]
> > > Sent: Monday, December 18, 2000 5:37 AM
> > > To: Micah Anderson
> > > Cc: noah; freebsd-scsi@freebsd.org; mark_salyzyn@adaptec.com
> > > Subject: Re: update
> > >
> > >
> > >
> > > Mark; I miscopied you on my previous reply to this message, sorry
about
> > > that.  Do you have any ideas?
> > >
> > > > On Sat, 16 Dec 2000, Mike Smith wrote:
> > > >
> > > > > > At "asr0: major=154" the raid card begins a high pitched beep
> > > indicating
> > > > > > that two of the drives have failed and that a rebuild of the
raid is
> > > > > > required, but we've tested all of the drives and replaced the
raid
> > > card
> > > > > > with a new one, and still get the same problem. The reason I'm
asking
> > > > > > about possible software issues is that other OS's have worked on
this
> > > raid
> > > > > > setup.
> > > > >
> > > > > I've copied Mark at Adaptec, who is the author and principle 
> > maintainer
> > > > > of the 'asr' driver, since he's going to have the best idea of
what's
> > > > > actually going on here.  Without saying which OS' you've used,
it's
> > > tough
> > > > > to know whether they simply aren't enabling the card alarm though.
> > > >
> > > > We have gone through exhaustive troubleshooting lengths to try to
> > > determine
> > > > what the problem is. I have swapped RAID cards, swapped cables,
tried a
> > > > different motherboard, different powersupply in every possible 
> > combination
> > > > of configuration. Each time I have to start from the beginning, 
> > destroying
> > > > the RAID configuration and then creating a new one, which takes over
an
> > > > hour, so this process has taken literally three weeks to try all the
> > > > potential configurations.
> > > >
> > > > The RAID alarm goes off on the card during the FreeBSD boot process,
the
> > > OS
> > > > continues to boot, but the alarm continues. Rebooting and going into
the
> > > > Adaptec setup tells us that a drive has failed, it is not the same
drive
> > > > every time. During bootup after the RAID POST when the SMOR utility
is
> > > > loading it will usually show the RAID-5 drive as well as the single 
> > drive.
> > > > It is almost as if one of the drives of the RAID is pushed out of
the
> > > RAID.
> > > > Individually, each drive works fine. If I install FreeBSD on a
single
> > > drive,
> > > > without a RAID constructed things act as normal.  These are IBM 10k
RPM
> > > LVD
> > > > drives and I ran IBM's drive test utility on each one of them and it
came
> > > > back with no errors.
> > > >
> > > > I have been able to install Debian Linux and use the card/drives
without
> > > > this problem. I have called Adaptec to ask them about this and was 
> > told to
> > > > try changing the drive speed from Ultra 3 to Ultra as well as change
the
> > > > delay from the default to 30 seconds, all of these do not change the
> > > > behavior whatsoever.
> > > >
> > > > I have spoken with one other person who had a similar type of
problem,
> > > > except what was happening to him was he was loading some DOS
drivers, one
> > > of
> > > > which would wipe the RAID card configuration when it was loaded
(ASAPI? I
> > > > can't recall right now)... I am wondering if there are some other
drivers
> > > > that are being probed in the generic FreeBSD kernel that are doing a
> > > similar
> > > > thing to the config.
> > > >
> > > > >
> > > > > Have you tried running the Adaptec management software to check
the
> > > > > status of the card?
> > > >
> > > > In FreeBSD? If there is such a thing it would be interesting to know

> > where
> > > > one could get it. The CD that was included with the card has no
FreeBSD
> > > > anything on it - the website has no FreeBSD information or downloads

> > on it
> > > > (except for the breif mention that it is supported, but if you call
for
> > > > support you can't get it). Or are you talking about the SMOR utility
that
> > > > you can access from the BIOS?
> > > >
> > > > Thanks for any help that you can offer.
> > > >
> > > > Micah
> > > >
> > > >
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > On 12/15, Mike Smith wrote:
> > > > > > >
> > > > > > > > Hi, I'm working on trying to install FreeBSD 4.2 on a dual
p3 700
> > > with
> > > > > > > > an Adaptec 3200S raid card. From what I can tell everyone 
> > that has
> > > tried
> > > > > > > > this card has had good luck. When we install FreeBSD
(booting off
> > > cd) it
> > > > > > > > recognizes the card and installs on it perfectly, but when
it
> > > loads the OS
> > > > > > > > off the raid it does something to damage the hardware raid,
> > > requiring us
> > > > > > > > to rebuild the RAID in the 3200S' bios. We're pretty sure
that
> > > this isn't
> > > > > > > > a hardware problem.
> > > > > > >
> > > > > > > You haven't actually included anything that suggests that 
> > there's a
> > > > > > > problem occurring, so it's somewhat difficult to guess what's
going
> > > on.
> > > > > > >
> > > > > > > However, I don't lend much credibility to the suggestion that
> > > "FreeBSD
> > > > > > > does something to damage the hadware raid" - things just don't
> > > happen
> > > > > > > like that.
> > > > > > >
> > > > > > > I would be inclined to suspect that you probably have a
suspect
> > > disk, or
> > > > > > > cabling/enclosure problems, but without more details it's hard

> > to be
> > > sure.
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > ... every activity meets with opposition, everyone who acts
has his
> > > > > > > rivals and unfortunately opponents also.  But not because
people
> > > want
> > > > > > > to be opponents, rather because the tasks and relationships
force
> > > > > > > people to take different points of view.  [Dr. Fritz Todt]
> > > > > > >            V I C T O R Y   N O T   V E N G E A N C E
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > noah .. email for pgp/gpg key
> > > > > >
> > > > >
> > > > > --
> > > > > ... every activity meets with opposition, everyone who acts has
his
> > > > > rivals and unfortunately opponents also.  But not because people
want
> > > > > to be opponents, rather because the tasks and relationships force
> > > > > people to take different points of view.  [Dr. Fritz Todt]
> > > > >            V I C T O R Y   N O T   V E N G E A N C E
> > > > >
> > > > >
> > > >
> > >
> > > --
> > > ... every activity meets with opposition, everyone who acts has his
> > > rivals and unfortunately opponents also.  But not because people want
> > > to be opponents, rather because the tasks and relationships force
> > > people to take different points of view.  [Dr. Fritz Todt]
> > >            V I C T O R Y   N O T   V E N G E A N C E
> > >
> >
> >
> >To Unsubscribe: send mail to majordomo@FreeBSD.org
> >with "unsubscribe freebsd-scsi" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DB155AD0CED411988E009027D61DB31D816D>