Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Jan 2004 09:12:04 +0100 (CET)
From:      Per von Zweigbergk <pvz@e.kth.se>
To:        Tony Holmes <tony@crosswinds.net>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: Signal 10?
Message-ID:  <Pine.LNX.4.58.0401290857350.21584@quetzalcoatlite.e.kth.se>
In-Reply-To: <20040128121913.A54789@crosswinds.net>
References:  <20040128121913.A54789@crosswinds.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 28 Jan 2004, Tony Holmes wrote:

> Quick question.
>
> I am getting occasional processes dying from Sig 10 and 11.
> It has been a long time since I saw these and to narrow down
> where I start my debugging, wanted to ask what the usual source
> of these signals (problems) are from?
>
> IIRC sig11 is bad memory, but sig 10?

Signal 11 is Segmentation Fault. This happens when programs try to write
to or read from memory they're not allowed to read. (This is quite common
if the program attempts to dereference unitialized pointers, or in case of
buffer overflow. The most common source of this problem is quite simply a
bug in the software in question. But if this is happening on many programs
in general, it could be a sign of hardware error, quite probably memory
error.

Signal 10 is Bus Error. This is much more rare, but still plausibly could
be caused by incorrectly written software. (I think I've seen Netscape 4
crash with this message once or twice -- but it's rare.) Below is the
definition from FOLDOC:

bus error

<processor> A fatal failure in the execution of a machine language
instruction resulting from the processor detecting an anomalous condition
on its bus. Such conditions include invalid address alignment (accessing a
multi-byte number at an odd address), accessing a physical address that
does not correspond to any device, or some other device-specific hardware
error. A bus error triggers a processor-level exception which Unix
translates into a "SIGBUS" signal which, if not caught, will terminate the
current process.

This can quite plausibly be caused by hardware error, or memory problems
in particular.

But note that random Signal 11's are a symptom of a problem, and their
appearance alone isn't enough to make a diagnosis. I suggest you download
the excellent utility MemTest86 (www.memtest86.com) for more information
on possible memory problems.

If you're too lazy or too poor to have the memory replaced (either under
warranty or out of pocket), and the system is not particularilly mission
critical, there is a kernel patch for Linux called BadRAM or something
along those lines, which allows you to simply not use the parts of the
memory which are bad. There are no comparable patches for FreeBSD as far
as I am aware.

If I had to put my money on what the problem with your setup was, I'd most
likely bet it was the memory -- but don't take my word for it -- use
memtest86!

Hope this helps.

-- 
Per von Zweigbergk <pvz@e.kth.se>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.58.0401290857350.21584>