From owner-freebsd-doc Fri Feb 18 11:10:58 2000 Delivered-To: freebsd-doc@freebsd.org Received: from zippy.cdrom.com (zippy.cdrom.com [204.216.27.228]) by hub.freebsd.org (Postfix) with ESMTP id 3E7E637B9D9; Fri, 18 Feb 2000 11:10:55 -0800 (PST) (envelope-from jkh@zippy.cdrom.com) Received: from zippy.cdrom.com (jkh@localhost [127.0.0.1]) by zippy.cdrom.com (8.9.3/8.9.3) with ESMTP id LAA79459; Fri, 18 Feb 2000 11:10:38 -0800 (PST) (envelope-from jkh@zippy.cdrom.com) To: bgingery@gtcs.com Cc: freebsd-doc@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: Recommended addition to FAQ (Troubleshooting) In-reply-to: Your message of "Fri, 18 Feb 2000 09:59:20 MST." <200002181659.JAA28578@ home.gtcs.com> Date: Fri, 18 Feb 2000 11:10:38 -0800 Message-ID: <79456.950901038@zippy.cdrom.com> From: "Jordan K. Hubbard" Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The situation here, I hate to say, is that you were simply very lucky in having a software memory tester show you anything at all. If your experience had been more typical, you would have run memtest86 and it would have declared your memory to be free of errors. Then you'd have gone right on having problems and losing more hair until you finally just came back to the memory and swapped it out on suspicion. Bingo, the problems are suddenly fixed and you're dragging memtest86 to KDE's trashcan with a resolve to never trust it again. The reason why software memory testers are so generally ineffectual is that there's a whole bunch of things getting in their way. Leaving aside for the moment the nasty problem of having your memory checker loaded into the bad memory in question, the cache also seriously gets in your way (and I'll bet you never even thought to turn both L1 and L2 caches off, did you? :-) by masking errors in a way which is transparent to the program. How's it supposed to know its accesses are getting cached or how much cache it has to "defeat" to get meaningful access to main memory? It can't, really, at least not in a way that's truly foolproof or workable across the entire range of Intel/AMD CPUs it might be run on, and that's why serious bench techs use hardware memory testers exclusively. I've used all kinds of software memory checkers, from "CheckIt" to highly proprietary packages that cost even more money, and the only thing they managed to convince me of is that swapping in known-good memory is the best and fastest way out of these situations. Unless you have a hardware memory tester available, trying to test it yourself is just too likely to give you a false sense of security and send you down more blind alleys. I've even put known BAD memory into boxes and had CheckIt tell me "looks good to me, boss!", just to verify my suspicion that it had lied to me before. It's also very slow to run a software memory tester without the caches enabled and swapping the memory is generally a whole lot faster than that. I'm impatient. :) So, to summarize, I am actually somewhat against the idea of including tools like this on the grounds that they can also help to convince people of the wrong things while they're debugging a problem. I also don't look forward to having to argue with users who've just run such tests and are still getting signal 11's but now refuse to believe that the memory could be bad because "they checked it." If I then turn around and tell them not to trust the tool I also stuck on the CD for them, they're going to ask why I put it there in the first place and a nice long argument will then ensue instead of us just replacing that damn memory and moving on. :-) - Jordan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message