Date: Fri, 27 Sep 2002 16:00:48 -0700 (PDT) From: Don Lewis <dl-freebsd@catspoiler.org> To: mb@imp.ch Cc: current@FreeBSD.ORG, hardware@FreeBSD.ORG Subject: memory/filesystem corruption, a cautionary tale (was: Re: Crashdumps available for download (solved I think)) Message-ID: <200209272300.g8RN0mvU002155@gw.catspoiler.org> In-Reply-To: <20020919111219.U52781-100000@levais.imp.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
On 19 Sep, Martin Blapp wrote: > > Hi all, > > With help of http://www.memtest86.com/memtest86-3.0.iso I've tracked > it down to three 3 ! bad DRAMS. Thanks for the pointer. I have continued to see transient filesystem damage that would disappear with a reboot, which made me suspect that the filesystem data cached in ram was being corrupted. Over the last few weeks it seemed to migrate from the /usr/src tree to the .depend files in /usr/obj. A small section of the file would be corrupted with binary garbage, but most characters within the damaged section would not be touched. The machine in question has an Athlon XP 1900+ processor and PC2100 ECC RAM. Last night I downloaded and ran memtest86 and after several passes I saw a burst of errors in Test #5. An entire byte of data was being flipped from 0xff to 0x00 or vice versa at intervals of 8 or 16 bytes over a small range of addresses. This would seem to indicate an error caused by one 8-bit wide chip on the 64-bit wide (72 with ECC) DIMM. The memtest86 documentation says that errors in Tests #5 and #8 are not uncommon on Athlon systems. The documentation suggests that some cases can be fixed by relaxing the memory timing, while others require replacing the RAM with RAM of higher quality. Since my RAM was from a reputable maker, I decided to try to adjust the memory timing. The BIOS allows a large number of tweaks to the memory timing and I did not relish the idea of having to blindly twiddle all the knobs. One thing that caught my eye was that the CAS Latency timing was set to 2 clocks. I thought that sounded agressive since not much RAM is rated for that timing. I bumped the CAS Latency to 2.5 clocks and the errors appeared to go away. Later I went back to check the specifications for the RAM that I bought, and it turns out that it is rated for a CAS Latency of 2.5 clocks! I tweaked the BIOS some more and tried both the failsafe settings and the "optimized" settings, and in both cases the automagic RAM configuration settings in the BIOS set the CAS Latency to 2. It looks like either my motherboard BIOS is incorrectly sensing the RAM speed, or it it senses the RAM speed correctly and is incorrectly configuring the RAM timing, or the actual RAM that I purchases is advertising the incorrect RAM speed. If you've got an Athlon system, you might want to double check this. I've been running memtest86 since last night with the CAS Latency set to 2.5 clocks and haven't seen any errors. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209272300.g8RN0mvU002155>