Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jul 2015 21:45:03 +0200
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Josh Paetzel <jpaetzel@FreeBSD.org>, freebsd-hardware@freebsd.org
Subject:   Re: L2 cache errors???
Message-ID:  <55B7DBBF.2090009@digiware.nl>
In-Reply-To: <55B7D24B.5060709@FreeBSD.org>
References:  <55B7B8FA.2060800@digiware.nl> <55B7C059.5020701@sentex.net> <55B7CCA1.4020906@digiware.nl> <55B7D24B.5060709@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28/07/2015 21:04, Josh Paetzel wrote:
> 
> 
> On 07/28/2015 13:40, Willem Jan Withagen wrote:
>> On 28/07/2015 19:48, Mike Tancsa wrote:
>>> On 7/28/2015 1:16 PM, Willem Jan Withagen wrote:
>>>> Hi,
>>>>
>>>> Are these what I think they are?
>>>> Errors in the CPU L2 cache?
>>>>
>>>> Are the ECC corrected?
>>>> Or is error really "data kaput"?
>>>>
>>>
>>>
>>> Could be. There is also an erratum issue that triggers these errors on
>>> certain CPUs when running software like virtualbox.  It was fixed in
>>> RELENG_10 some time ago. What are you running ?
>>>
>>>
>>> https://svnweb.freebsd.org/base?view=revision&revision=269052
>>>
>>> has some details.
>>
>> 'mmm,
>> Not running Haswell stuff, but rather older hardware.
>>
>> Looked in older logfiles, and there are a few more...
>> All with the same data, except that it is detected on different CPUs
>>
>> And it occurs when running:
>> 	        mbuffer -4 -m 1000M -I 6666 | \
>>                 zfs receive -F -d -v zfs
>> to receive a full backup from my fileserver.
>>
>> --WjW
>>
> 
> You can tell ECC corrected the error because on FreeBSD if ECC can't fix
> the error the system will panic.  Other systems (Solaris and HP-UX being
> the two I have direct experience with) can detach subsystems that have
> sustained uncorrectable errors in some cases. (Yes, even CPUs!)

Offlining CPus, cool.
No the system does not panic, but I do get reports from 'zfs receive'
that the datastream is invalid. And it then aborts.
So I'll have to do more digging, to see what is up.

> If a system is generating hundreds or thousands of MCAs a minute you are
> dealing with a hardware issue.
> 
> If you are getting spurious MCAs to the tune of a few a day there's
> nothing abnormal or broken there it's just the system doing what it's
> supposed to.

Never had them before, and now about 6 this week.
Let alone in L2 cache.
So it got me worried.

> Given the amount of data that flies around inside modern computers I'm
> surprised there aren't more MCAs than there are in most systems.

Perhaps not enough alpha particles hitting the cells. :)

Thanx,
--WjW




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55B7DBBF.2090009>