Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Apr 2006 21:08:32 +0100
From:      Alex Zbyslaw <xfb52@dial.pipex.com>
To:        matthew@digitalstratum.com
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: FreeBSD Crash without Errors, Warnings, or Panics
Message-ID:  <443EAFC0.6040308@dial.pipex.com>
In-Reply-To: <443EA32B.408@digitalstratum.com>
References:  <443E95C1.4030404@digitalstratum.com> <443E9C38.709@dial.pipex.com> <443EA32B.408@digitalstratum.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Hagerty wrote:

> Alex Zbyslaw wrote:
>
>> Matthew Hagerty wrote:
>>
>>> Can anyone shed some light on this, give me some options to try?  
>>> What happened to kernel panics and such when there were serious 
>>> errors going on?  The only glimmer of information I have is that 
>>> *one* time there was an error on the console about there not being 
>>> any RAID controller available.  I did purchase a spare controller 
>>> and I'm about to swap it out and see if it helps, but for some 
>>> reason I doubt it.  If a controller like that was failing, I would 
>>> certainly hope to see some serious error messages or panics going on.
>>>
>>> I have been running FreeBSD since version 1.01 and have never had a 
>>> box so unstable in the last 12 or so years, especially one that is 
>>> supposed to be "server" quality instead of the make-shift ones I put 
>>> together with desktop hardware.  And last, I'm getting sick of my 
>>> Linux admin friends telling me "told you so!  should have run 
>>> Linux...", please give me something to stick in their pie holes!
>>
>>
>> Several times now I have had Linux servers (and production quality 
>> ones, not built by me ones :-)) die in a somewhat similar fashion.  
>> In every case the cause has been either a flaky disk or a flaky disk 
>> controller, or some combination.
>>
>> What seems to happen is that the disk is entirely "lost" by the OS.  
>> At that point any process which never accesses the disk (i.e. is 
>> already in memory) is able to run but the moment any process tries to 
>> access the disk it locks up.  So you can't ssh in to the server, but 
>> if you happen to be logged in, you shell is probably cached and keeps 
>> working.  If you typed ls recently, you can run ls (but see nothing 
>> or get a cryptic error message like I/O Error), for example.
>
>
> Hmm, that just seems odd that a disk controller just vanishing would 
> not cause some sort of console message?  Even if the disk device is 
> gone, /dev/console should still be intact to display an error, no?  
> Also, a disk device that is all of a sudden missing seems pretty 
> serious to me, since a disk is one of the main devices that modern 
> OSes cannot run without (generally speaking.)  I would think *some* 
> console message should be warranted.

Not if syslogd tries to access the disk :-(  All can say is that I have 
seen three Linux boxes go this way; I've never had this kind of failure 
on a BSD box (touch wood) so all I can do is speculate about the 
similarities.  Also, you did get a console message once, didn't you?

>
> I'll see if there are any diag programs for the controller and I'll go 
> ahead and swap the controller out.  I wonder if the RAID configuration 
> in stored in the controller or on the disks?  I'd hate to have to 
> rebuild the server install...

I believe both and the RAID controller will compare what it thinks it 
should see with what it sees on the disks.

If you are moving to a new, identical controller I would have thought 
that the worst you would have to do is to reconfigure it to accept the 
disks you give it as your specified configuration without it trying to 
rebuild anything.

hth,

--Alex




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?443EAFC0.6040308>