Date: Sun, 13 Aug 2000 13:59:43 -0700 From: Joe Modjeski <jmodjeski@ms1.northlink.com> To: "'Bernd Walter '" <ticso@mail.cicely.de> Cc: "''freebsd-scsi@freebsd.org' '" <freebsd-scsi@freebsd.org> Subject: RE: to Vinum or not to Vinum Message-ID: <00101B7A7FDDD311A89500A0CC56C79048BE@MS1>
next in thread | raw e-mail | index | archive | help
-----Original Message----- From: Bernd Walter To: Joe Modjeski Cc: 'Bernd Walter'; 'freebsd-scsi@freebsd.org' Sent: 8/12/00 3:20 PM Subject: Re: to Vinum or not to Vinum On Sat, Aug 12, 2000 at 01:11:39PM -0700, Joe Modjeski wrote: > > > On Thu, Aug 10, 2000 at 12:09:47PM -0700, Joe Modjeski wrote: > > > Currently we have 3 Compaq Proliant 1600R servers with 6 > > 9.1 Ultra3 drives > > > in each. We are attempting (very unsuccessfully) to do > > Raid5 with vinum. > > > We get fatal trap 12 errors very regularly and after a few > > reboots the vinum > > > volume is so chewed up that we end up having to rebuild the > > system. I > > > tracked down the majority of the problems to the > > /etc/security script. I > > > believe it is about the 6th or 7th line down where it > > starts the find run. > > > The box starts off fine but after about 1 minute it starts > > to hit all the > > > drives at once then BLAM!! It gives me the error. > > > > Are your fatal trap 12 errors kernel panics? > > If yes do you see some SCSI error messages directly before > > this happens? > > Yes they are kernel panics. And Yes there are always SCSI errors. > > BAD DSA ( SOME_HEX_NUMBER ) in queue > SCSI BUS RESET DETECTED sym0:0:-1:-1 > > The above isn't exact. The message conveniently misses the logs. I can get > the exact messages if you would like. I am trying to avoid crashing the box > as much as possible. :) The exact error including the hex codes is important to distinguish between a bus error or something in the code. > The drives are Hotswap and it does appear that they get "Disconnected" when > the error happens. It is however no specific. In my original vinum setup I > was spanning the raid across all 6 drives. Then it was consistant with > drive 0. I though that was reason for the trouble so I changed the > configuration to the one included in the previous message. > > I have compiled a debug kernel in an effort to get a dump and now the fatal > trap 12 kernel panics are less the SCSI errors that go along with them are > more consistant. You mean you get SCSI errors sometimes without panics directly after? Are you still using the sym controller or is that behavour with the ahc card you mentioned? -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de Yes that is corect. I get the SCSI errors without the panics directly after. This actually where the activity on the box gets strange. Some times it will repeat the SCSI BUS RESET DETECTED error over and over on the console. If I log in from a remote session (telnet or ssh) it will print the header and login prompt but when I try to login the console starts to try to reboot. I get the "syncing disks..." printed over and over until I hard reboot. The other scenario is I get the SCSI BUS RESET DETECTED error once and I am able to establish a remote connection to the box and everything seems fine except for a zombie "find" process, which urks the heck out of me so I reboot the box. In either scenario the console session gets hung and you are unable to switch vtys or use the keyboard. I will get the exact messages for you tommorow if everything goes well. I have had a newly installed FreeBSD proxy that has been acting up and the resort we installed it in is having a conference this week. The only thing that has comforted the Execs at the resort has been my pager. Joe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00101B7A7FDDD311A89500A0CC56C79048BE>