Date: Tue, 28 Jul 2015 21:45:03 +0200 From: Willem Jan Withagen <wjw@digiware.nl> To: Josh Paetzel <jpaetzel@FreeBSD.org>, freebsd-hardware@freebsd.org Subject: Re: L2 cache errors??? Message-ID: <55B7DBBF.2090009@digiware.nl> In-Reply-To: <55B7D24B.5060709@FreeBSD.org> References: <55B7B8FA.2060800@digiware.nl> <55B7C059.5020701@sentex.net> <55B7CCA1.4020906@digiware.nl> <55B7D24B.5060709@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28/07/2015 21:04, Josh Paetzel wrote: > > > On 07/28/2015 13:40, Willem Jan Withagen wrote: >> On 28/07/2015 19:48, Mike Tancsa wrote: >>> On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: >>>> Hi, >>>> >>>> Are these what I think they are? >>>> Errors in the CPU L2 cache? >>>> >>>> Are the ECC corrected? >>>> Or is error really "data kaput"? >>>> >>> >>> >>> Could be. There is also an erratum issue that triggers these errors on >>> certain CPUs when running software like virtualbox. It was fixed in >>> RELENG_10 some time ago. What are you running ? >>> >>> >>> https://svnweb.freebsd.org/base?view=revision&revision=269052 >>> >>> has some details. >> >> 'mmm, >> Not running Haswell stuff, but rather older hardware. >> >> Looked in older logfiles, and there are a few more... >> All with the same data, except that it is detected on different CPUs >> >> And it occurs when running: >> mbuffer -4 -m 1000M -I 6666 | \ >> zfs receive -F -d -v zfs >> to receive a full backup from my fileserver. >> >> --WjW >> > > You can tell ECC corrected the error because on FreeBSD if ECC can't fix > the error the system will panic. Other systems (Solaris and HP-UX being > the two I have direct experience with) can detach subsystems that have > sustained uncorrectable errors in some cases. (Yes, even CPUs!) Offlining CPus, cool. No the system does not panic, but I do get reports from 'zfs receive' that the datastream is invalid. And it then aborts. So I'll have to do more digging, to see what is up. > If a system is generating hundreds or thousands of MCAs a minute you are > dealing with a hardware issue. > > If you are getting spurious MCAs to the tune of a few a day there's > nothing abnormal or broken there it's just the system doing what it's > supposed to. Never had them before, and now about 6 this week. Let alone in L2 cache. So it got me worried. > Given the amount of data that flies around inside modern computers I'm > surprised there aren't more MCAs than there are in most systems. Perhaps not enough alpha particles hitting the cells. :) Thanx, --WjW
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55B7DBBF.2090009>