Date: Mon, 8 Jan 2001 08:53:10 -0500 From: "Robinson, Kimberly" <kimberly_robinson@adaptec.com> To: "Salyzyn, Mark" <mark_salyzyn@adaptec.com>, "'Micah Anderson'" <micah@indymedia.org>, "'Chris Snell'" <chris@bikeworld.com> Cc: "'Mike Smith'" <msmith@freebsd.org>, "'noah'" <noah@indymedia.org>, "'freebsd-scsi@freebsd.org'" <freebsd-scsi@freebsd.org> Subject: RE: update Message-ID: <50DB155AD0CED411988E009027D61DB30FFFD6@otcexc01.otc.adaptec.com>
index | next in thread | raw e-mail
Hello All, I think that what we are seeing in this instance is an issue with the firmware (microcode) on the drives. There is a known issue when these drives are in a RAID configuration, they frustratingly drop out randomly on reboots. I have been told by IBM that all DDYS/DPSS drives manufactured after December 15th need updated to microcode S96H. I have been that the IBM DDYS/DPSS development team have a team standing by if S96H microcode fails to fix spin down issue. The easiest way to update the drive's microcode is to connect all drives to another Adaptec controller like a 2940 or 29160 and create a bootable disk with the ASPI driver to run their update program. IBM's support info as follows: IBM TG Technical Support Center 888.426.5214 drive@us.ibm.com There is nothing else required after the firmware update and no one else has been reporting problems after updating. Thanks, Kimberly Robinson > -----Original Message----- > From: Salyzyn, Mark > Sent: Friday, January 05, 2001 1:18 PM > To: 'Micah Anderson'; Chris Snell > Cc: 'Mike Smith'; noah; freebsd-scsi@freebsd.org; Robinson, Kimberly > Subject: RE: update > > > I feel this *must* be a controller firmware issue. To resolve > this, our technical support department is going to need to > duplicate this problem so our Firmware engineers can > understand why the drives are going offline. It Feels like > the combination of Hardware is making the Firmware `brittle' > where subtle changes cause the issues to come and go. The > controller Firmware contains *all* the smarts associated with > the SCSI bus communications. > > Technical support may be able to supply you a different > hardware versioned card with debugging (UART 115200 Baud) > port installed to capture the needed information to resolve > this. IMHO, this is the *best* way to resolve this to > fruition. You have effectively swapped enough things around > to have isolated away the hardware domain validation possibilities. > > In any case, escalating this to Adaptec Technical support to > see if they have any other practical ideas. > > Sincerely -- Mark Salyzyn > > -----Original Message----- > From: Micah Anderson [mailto:micah@indymedia.org] > Sent: Friday, January 05, 2001 12:45 PM > To: Chris Snell > Cc: Micah Anderson; Salyzyn, Mark; 'Mike Smith'; noah; > freebsd-scsi@freebsd.org > Subject: Re: update > > > Chris, > > I would be interested in seeing your kernel config so I could > comapre it to > the one I made. When I tried the card in a different machine > that machine > had a totally different motherboard and BIOS. I have been finding that > Debian does work, but it is sensitive. For example, if I try > to boot from > the RAID I get the same behavior, but if I boot from a > separate IDE drive > and just mount the raid partitions things are fine. I have a > feeling that > perhaps there is a RAID header at the beginning of the > logicial volume that > can be overwritten by a master boot record or a boot loader > like Lilo, or > Grub, or the FreeBSD loader...? > > Micah > > On Thu, 04 Jan 2001, Chris Snell wrote: > > > > > Micah, > > > > Would it be of any help if I sent you the kernel config for > our server that > > has one of these cards in it? As I said earlier, it's been > working great > > for us. Also, when you tried this card in a different > machine, did that > > machine have the same motherboard and BIOS? You mentioned > that Debian > > works on your setup. Did you try installing it (Debian) > and then hammering > > on the disks or did you just verify that it installed? > > > > Chris > > > > At 03:25 PM 12/26/2000 -0800, Micah Anderson wrote: > > >So I have tried pretty much everything, the alarm still > goes off at the same > > >time during boot up, at asr0: major=154. I am trying a > last experiment > > >today, if it doesn't work, I am sad to say that I am going > to have to use > > >Debian since it works fine there. I have had this server > for over a month > > >trying everything on the planet to get it to work, we need > this server > > >running in a bad way and although I want to go with > FreeBSD we unfortunately > > >are going to have to go with what works. > > > > > >Right now I am trying to recompile the kernel by pulling > everything out of > > >the config file, except what is needed. It seems as if the > problem has to do > > >with the FreeBSD scsi or asr driver. Because thats when > things go, and if I > > >can boot off the CD without this happening, then something > is funky. > > > > > >I was called by Ida at Adaptec to follow up on the call > that I originally > > >placed, ID #2843, but I was given the wrong number to call > her back. > > > > > >I've done practically everything in my power, besides > getting a job at > > >adaptec or delving into the FreeBSD driver code, neither > of which I can do > > >at this point. Do you guys have any other ideas, or > suggestions where to go > > >next? > > > > > >Just a reminder, this is an adaptec 3200s, using freebsd > 4.2, 4 IBM 9 gig > > >10,000 RPM LVD drives making up a Raid-5, using a nice > Intel motherboard > > >(which has another adaptec on board controller, but I've > tried the card in a > > >different machine with the drives, same results).... > > > > > >Micah > > > > > > > > > > > >On Mon, 18 Dec 2000, Salyzyn, Mark wrote: > > > > > > > Although I figure Adaptec's Tech Support would be the > best to know about > > > > generic issues with drive access, the possibilities for > this issue > > > could be: > > > > > > > > 1) No cable and/or drive cabinet domain validation, so > one might have to > > > > roll the SCSI speed down a bit to compensate for cable > and/or drive > > > > combination issues. > > > > 2) Some drives are more comfortable with either over > (more than just > > > the two > > > > endpoints) or under (only the last drive or controller) > termination. > > > > 3) Contact tech support for a later Firmware release, > there may be known > > > > issues with your drives, cabinets and/or drive > combinations that might have > > > > been addressed with either drive firmware, or > controller firmware updates. > > > > Currently the customer has better access to Technical > Support than I do at > > > > this moment :-( even though I virtually end up driving > over top them each > > > > morning as I head to the parking lot ... > > > > > > > > In any case, I will report this to the Firmware > engineers to see if they > > > > have any additional comments to add about this issue. > > > > > > > > Keep in mind that at initial negotiation, the speed is > lower, the transfers > > > > less stressful, than at operating system time. Edge > issues may surface as a > > > > result, sometimes appearing different from OS to OS. > For instance, I > > > believe > > > > the ASR driver can request up to 58 (~4KB) > scatter/gather elements in one > > > > request, allowing up to 256 requests/device. NT's > scsiport driver, on the > > > > other hand, limits request to 64KB/each and only 16 > requests/controller. > > > > Stresses vary. > > > > > > > > However, OS issues do not typically affect drive > failures, which is > > > curious. > > > > I have an issue that comes up in FreeBSD, for instance, > with the array > > > > performance in an impacted (read failures do not fail > an array since data > > > > can be reconstructed) state since the requests take > much longer to fulfill > > > > than in the genuine failed state. Impacted means every > request still tries > > > > to be fulfilled by first trying to talk to the not-yet > failed component. > > > > This has the catch-22 effect of not being able to mount > the array head due > > > > to the protracted responses on some failed drive > scenarios before the > > > > adapter has considered the component to be marked as > failed. Pulling the > > > > errant drive might be the only way. Later adapter > Firmware may deal with > > > > this through careful consideration of request response > time. Tech support > > > > may supply a select fail-on-read firmware/NVRAM, or one > can chose to > > > bump up > > > > the timeout in the SCSI layer. This issue, for > instance, does not occur > > > > under Solaris because their SCSI layer is set to 2 > minute timeouts. > > > > > > > > Sincerely -- Mark Salyzyn > > > > > > > > -----Original Message----- > > > > From: Mike Smith [mailto:msmith@freebsd.org] > > > > Sent: Monday, December 18, 2000 5:37 AM > > > > To: Micah Anderson > > > > Cc: noah; freebsd-scsi@freebsd.org; mark_salyzyn@adaptec.com > > > > Subject: Re: update > > > > > > > > > > > > > > > > Mark; I miscopied you on my previous reply to this > message, sorry about > > > > that. Do you have any ideas? > > > > > > > > > On Sat, 16 Dec 2000, Mike Smith wrote: > > > > > > > > > > > > At "asr0: major=154" the raid card begins a high > pitched beep > > > > indicating > > > > > > > that two of the drives have failed and that a > rebuild of the raid is > > > > > > > required, but we've tested all of the drives and > replaced the raid > > > > card > > > > > > > with a new one, and still get the same problem. > The reason I'm asking > > > > > > > about possible software issues is that other OS's > have worked on this > > > > raid > > > > > > > setup. > > > > > > > > > > > > I've copied Mark at Adaptec, who is the author and > principle > > > maintainer > > > > > > of the 'asr' driver, since he's going to have the > best idea of what's > > > > > > actually going on here. Without saying which OS' > you've used, it's > > > > tough > > > > > > to know whether they simply aren't enabling the > card alarm though. > > > > > > > > > > We have gone through exhaustive troubleshooting > lengths to try to > > > > determine > > > > > what the problem is. I have swapped RAID cards, > swapped cables, tried a > > > > > different motherboard, different powersupply in every > possible > > > combination > > > > > of configuration. Each time I have to start from the > beginning, > > > destroying > > > > > the RAID configuration and then creating a new one, > which takes over an > > > > > hour, so this process has taken literally three weeks > to try all the > > > > > potential configurations. > > > > > > > > > > The RAID alarm goes off on the card during the > FreeBSD boot process, the > > > > OS > > > > > continues to boot, but the alarm continues. Rebooting > and going into the > > > > > Adaptec setup tells us that a drive has failed, it is > not the same drive > > > > > every time. During bootup after the RAID POST when > the SMOR utility is > > > > > loading it will usually show the RAID-5 drive as well > as the single > > > drive. > > > > > It is almost as if one of the drives of the RAID is > pushed out of the > > > > RAID. > > > > > Individually, each drive works fine. If I install > FreeBSD on a single > > > > drive, > > > > > without a RAID constructed things act as normal. > These are IBM 10k RPM > > > > LVD > > > > > drives and I ran IBM's drive test utility on each one > of them and it came > > > > > back with no errors. > > > > > > > > > > I have been able to install Debian Linux and use the > card/drives without > > > > > this problem. I have called Adaptec to ask them about > this and was > > > told to > > > > > try changing the drive speed from Ultra 3 to Ultra as > well as change the > > > > > delay from the default to 30 seconds, all of these do > not change the > > > > > behavior whatsoever. > > > > > > > > > > I have spoken with one other person who had a similar > type of problem, > > > > > except what was happening to him was he was loading > some DOS drivers, one > > > > of > > > > > which would wipe the RAID card configuration when it > was loaded (ASAPI? I > > > > > can't recall right now)... I am wondering if there > are some other drivers > > > > > that are being probed in the generic FreeBSD kernel > that are doing a > > > > similar > > > > > thing to the config. > > > > > > > > > > > > > > > > > Have you tried running the Adaptec management > software to check the > > > > > > status of the card? > > > > > > > > > > In FreeBSD? If there is such a thing it would be > interesting to know > > > where > > > > > one could get it. The CD that was included with the > card has no FreeBSD > > > > > anything on it - the website has no FreeBSD > information or downloads > > > on it > > > > > (except for the breif mention that it is supported, > but if you call for > > > > > support you can't get it). Or are you talking about > the SMOR utility that > > > > > you can access from the BIOS? > > > > > > > > > > Thanks for any help that you can offer. > > > > > > > > > > Micah > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > On 12/15, Mike Smith wrote: > > > > > > > > > > > > > > > > > Hi, I'm working on trying to install FreeBSD > 4.2 on a dual p3 700 > > > > with > > > > > > > > > an Adaptec 3200S raid card. From what I can > tell everyone > > > that has > > > > tried > > > > > > > > > this card has had good luck. When we install > FreeBSD (booting off > > > > cd) it > > > > > > > > > recognizes the card and installs on it > perfectly, but when it > > > > loads the OS > > > > > > > > > off the raid it does something to damage the > hardware raid, > > > > requiring us > > > > > > > > > to rebuild the RAID in the 3200S' bios. We're > pretty sure that > > > > this isn't > > > > > > > > > a hardware problem. > > > > > > > > > > > > > > > > You haven't actually included anything that > suggests that > > > there's a > > > > > > > > problem occurring, so it's somewhat difficult > to guess what's going > > > > on. > > > > > > > > > > > > > > > > However, I don't lend much credibility to the > suggestion that > > > > "FreeBSD > > > > > > > > does something to damage the hadware raid" - > things just don't > > > > happen > > > > > > > > like that. > > > > > > > > > > > > > > > > I would be inclined to suspect that you > probably have a suspect > > > > disk, or > > > > > > > > cabling/enclosure problems, but without more > details it's hard > > > to be > > > > sure. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > ... every activity meets with opposition, > everyone who acts has his > > > > > > > > rivals and unfortunately opponents also. But > not because people > > > > want > > > > > > > > to be opponents, rather because the tasks and > relationships force > > > > > > > > people to take different points of view. [Dr. > Fritz Todt] > > > > > > > > V I C T O R Y N O T V E N G E A N C E > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > noah .. email for pgp/gpg key > > > > > > > > > > > > > > > > > > > -- > > > > > > ... every activity meets with opposition, everyone > who acts has his > > > > > > rivals and unfortunately opponents also. But not > because people want > > > > > > to be opponents, rather because the tasks and > relationships force > > > > > > people to take different points of view. [Dr. Fritz Todt] > > > > > > V I C T O R Y N O T V E N G E A N C E > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > ... every activity meets with opposition, everyone who > acts has his > > > > rivals and unfortunately opponents also. But not > because people want > > > > to be opponents, rather because the tasks and > relationships force > > > > people to take different points of view. [Dr. Fritz Todt] > > > > V I C T O R Y N O T V E N G E A N C E > > > > > > > > > > > > >To Unsubscribe: send mail to majordomo@FreeBSD.org > > >with "unsubscribe freebsd-scsi" in the body of the message > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the messagehelp
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DB155AD0CED411988E009027D61DB30FFFD6>
