From owner-freebsd-hardware@freebsd.org Tue Jul 28 19:45:21 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 564C79ADD81 for ; Tue, 28 Jul 2015 19:45:21 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 15EFE155; Tue, 28 Jul 2015 19:45:20 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id AAF7115340A; Tue, 28 Jul 2015 21:45:16 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Awf4n2wsnzRR; Tue, 28 Jul 2015 21:45:07 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:4af:3a18:46a:110b] (unknown [IPv6:2001:4cb8:3:1:4af:3a18:46a:110b]) by smtp.digiware.nl (Postfix) with ESMTPA id 73E00153401; Tue, 28 Jul 2015 21:45:07 +0200 (CEST) Subject: Re: L2 cache errors??? To: Josh Paetzel , freebsd-hardware@freebsd.org References: <55B7B8FA.2060800@digiware.nl> <55B7C059.5020701@sentex.net> <55B7CCA1.4020906@digiware.nl> <55B7D24B.5060709@FreeBSD.org> From: Willem Jan Withagen Message-ID: <55B7DBBF.2090009@digiware.nl> Date: Tue, 28 Jul 2015 21:45:03 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <55B7D24B.5060709@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jul 2015 19:45:21 -0000 On 28/07/2015 21:04, Josh Paetzel wrote: > > > On 07/28/2015 13:40, Willem Jan Withagen wrote: >> On 28/07/2015 19:48, Mike Tancsa wrote: >>> On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: >>>> Hi, >>>> >>>> Are these what I think they are? >>>> Errors in the CPU L2 cache? >>>> >>>> Are the ECC corrected? >>>> Or is error really "data kaput"? >>>> >>> >>> >>> Could be. There is also an erratum issue that triggers these errors on >>> certain CPUs when running software like virtualbox. It was fixed in >>> RELENG_10 some time ago. What are you running ? >>> >>> >>> https://svnweb.freebsd.org/base?view=revision&revision=269052 >>> >>> has some details. >> >> 'mmm, >> Not running Haswell stuff, but rather older hardware. >> >> Looked in older logfiles, and there are a few more... >> All with the same data, except that it is detected on different CPUs >> >> And it occurs when running: >> mbuffer -4 -m 1000M -I 6666 | \ >> zfs receive -F -d -v zfs >> to receive a full backup from my fileserver. >> >> --WjW >> > > You can tell ECC corrected the error because on FreeBSD if ECC can't fix > the error the system will panic. Other systems (Solaris and HP-UX being > the two I have direct experience with) can detach subsystems that have > sustained uncorrectable errors in some cases. (Yes, even CPUs!) Offlining CPus, cool. No the system does not panic, but I do get reports from 'zfs receive' that the datastream is invalid. And it then aborts. So I'll have to do more digging, to see what is up. > If a system is generating hundreds or thousands of MCAs a minute you are > dealing with a hardware issue. > > If you are getting spurious MCAs to the tune of a few a day there's > nothing abnormal or broken there it's just the system doing what it's > supposed to. Never had them before, and now about 6 this week. Let alone in L2 cache. So it got me worried. > Given the amount of data that flies around inside modern computers I'm > surprised there aren't more MCAs than there are in most systems. Perhaps not enough alpha particles hitting the cells. :) Thanx, --WjW