Date: Fri, 20 Jul 2012 15:22:44 -0700 From: James Snow <snow@teardrop.org> To: Dr Josef Karthauser <joe@tao.org.uk> Cc: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org> Subject: Re: Checksum errors across ZFS array Message-ID: <20120720222244.GA18627@teardrop.org> In-Reply-To: <BC2AD7AE-4D82-4989-9D51-F1F2329C00EB@tao.org.uk> References: <20120719152909.GL32960@teardrop.org> <002D6A20-D2A4-4909-B2EA-3DB562326050@tao.org.uk> <20120719171548.GM32960@teardrop.org> <BC2AD7AE-4D82-4989-9D51-F1F2329C00EB@tao.org.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 20, 2012 at 04:09:28PM +0100, Dr Josef Karthauser wrote: > Take care though, my system which had been working fine for about > a year when I noticed the ZFS rot (which all appears to be recent > in time). I ran memcheck+ on it for 8 hours or so, and it showed no > errors at all. However, when I replaced the memory with a different > vendor the problems went away. (Reboots and power off/on restarts > hadn't fixed the problem before!). > > So, take care if the memory doesn't report any failures, it might > still be faulty. I've run memtest for about 20 hours now (13 hours in one pass, 7 and counting on the second) and seen no errors. Hrm. > p.s. It was my fault that I wasn't running ECC memory on the system! I am running ECC memory though. If you'd had ECC memory to start do you think you might have seen a different result? In my case, replacing all the RAM and getting a 2nd controller are almost the same cost. Since a second controller will give me the best visibility - or long-term expandability if it turns out not to be the controller - I've gone ahead and ordered one. If I move half the disks to the new controller and continue to see the problems only on the old controller, I know it's the controller or the slot on the motherboard. If the problem continues without any change, I can replace RAM, and then the motherboard. -Snow
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120720222244.GA18627>