Date: Sat, 02 Dec 2000 23:49:24 +0000 From: Peter Gradwell <peter@gradwell.com> To: Mike Smith <msmith@freebsd.org> Cc: freebsd-scsi@freebsd.org Subject: Re: Mylex DAC960 Driver "online/offline" Message-ID: <5.0.0.25.0.20001202233356.0366b2d8@pop3.gradwell.net> In-Reply-To: <200012022339.eB2NdWF21371@mass.osd.bsdi.com> References: <Your message of "Fri, 01 Dec 2000 21:32:54 GMT." <5.0.0.25.0.20001201212649.03798548@pop3.gradwell.net>
index | next in thread | previous in thread | raw e-mail
Hi Mike,
At 15:39 02/12/2000 -0800, Mike Smith wrote:
> > What does this message really mean?
>
>It means that the controller is telling us that the drive is offline.
>Then that it's online. Then that it's offline again.
>
>You don't say what the time intervals between these messages are; you can
>get the 'drive offline' message from either the status poll (once per
>second) or if an I/O operation is sent to a drive that the controller
>reports as offline. The 'drive online' message only comes from the
>status poll though.
It was occuring without any apparent activity, about once per second,
so I would guess it was from the status poll.
>Can you describe your configuration? I can try to reproduce the
>situation here and see if it's not possible that there's a bug in the
>driver confusing the status between your two drives. I have to say,
>though, that the fact that the controller thinks that one of your system
>drives is offline when you claim it's a mirror is a bit troubling.
Ok, on an update to the situation though, I was able to get too the
mylex bios (there is 250 miles between me and the machine you see!)
via a serial console and discovered that it had marked two drives offline.
We have:
3 x 18 gig disks, of which two are bonded in a raid 1 pack
and one is a hot spare
2 x 36 gig disks, bonded in a raid 0 pack.
Everything apart from /var/spool/news is on the raid 1 pack. (Yeah, it's
a news server.)
One of the 18 gig disks and one of the 36 gig disks were marked offline.
I belive that when the 18 gig disk was marked off line the RAID card
rebuilt it's redundancy data onto the hot spare disk and carried on.
- cos the 18 gig which is off line was part of the raid 1 pack and there
is now not hot spare. *So, that's good.*
So, we hard reset the machine and it booted. However, the symptoms
described previously prevailed. We couldn't login via ssh or on the console
as it was unresponsive.
* This worries me. I would hope the machine would take the loss of
/v/s/news gracefully, and carry on.
So, when I accessed the bios this morning, I tried, as an "experiment"
to put the 36 gig disk back online and rebooted. After running fsck
a bit (is there a journaling file system for freebsd?!) the machine is
now running ok.
I have yet to schedule a reboot to mark the currently off line 18 gig
disk as the hot spare. I think I will be able to do this.
I am worried that the controller randomly marks the drives off line. Mylex
tell me this happens when it looses contact with the drives.
They are internal drives, well screwed into a big case, nicely racked
into a locked cabinet in Telehouse Europe. From what I can gather, no
one accessed the rack. It appears they aren't disconnected anyway
because I can mark them online and we're go again.
I'd be happy to help with more information if it helps. Directed questions
work best!
thanks
peter
--
peter gradwell; online @ http://www.gradwell.com/peter/
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5.0.0.25.0.20001202233356.0366b2d8>
