From owner-freebsd-alpha@FreeBSD.ORG Wed Dec 14 14:08:50 2005 Return-Path: X-Original-To: freebsd-alpha@freebsd.org Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 95C2716A41F for ; Wed, 14 Dec 2005 14:08:50 +0000 (GMT) (envelope-from unknown@abac.com) Received: from smtp2.abac.com (smtp2.abac.com [216.55.128.210]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2BBC043D49 for ; Wed, 14 Dec 2005 14:08:50 +0000 (GMT) (envelope-from unknown@abac.com) Received: from 02-030.143.popsite.net ([66.248.82.30] helo=unknown) by smtp2.abac.com with smtp id 1EmXJG-0006E4-5R for freebsd-alpha@freebsd.org; Wed, 14 Dec 2005 06:08:49 -0800 From: J.C. Roberts To: freebsd-alpha@freebsd.org Date: Wed, 14 Dec 2005 06:07:49 -0800 Organization: None Message-ID: References: In-Reply-To: X-Mailer: Forte Agent 1.93/32.576 English (American) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Re: SRM memtest X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Dec 2005 14:08:50 -0000 On Tue, 13 Dec 2005 16:00:02 -0800, J.C. Roberts wrote: >I noticed something odd on an Alpha Personal Workstation 433 that I got >off of eBay. The ARC/AlphaBIOS would occasionally report 256MB rather >than the usual 384MB. This weirdness was intermittent. I have reseated >everything in the system to make sure there are no connection/connector >issues but I think it would be prudent to actually test the memory >itself. > >I kicked the system into SRM Console mode and I've been trying to run >memtest to no avail. I believe *I* am the real problem since I don't >know what the heck I'm doing in SRM in spite of the fact that I've read >the SRM Console user guide. >http://ftp.digital.com/pub/Digital/info/semiconductor/literature/srmcons= .pdf > >The SRM version is v7.2-1 Mar 6, 2000 > >Running even the most simple tests seems to basically lock up the system >since the command fails to ever exit even if you let it run for a couple >hours to try completing two passes. > > >>> memtest -rb -p 2 > >If you background the memtest process and run show_status, it seems to >pass at least once? > > >>> memtest -rb -p 2 & > >>> show_status > ID Program Device Pass Hard/Soft Written Read > -------- --------- -------- ------ ----------- --------- ------ > 00000001 idle system 0 0 0 0 0=20 > 0000004F memtest memory 1 0 0 0 0=20 > > >Using >>>kill_diags afterwards only locks up the system. > >I've searched around for more detailed instructions on the web. I found >a cryptic post to the DebianAlpha list >http://lists.debian.org/debian-alpha/2004/11/msg00064.html > >It mentions using > >>>>dynamic -r > >to figure out values to use with memtest switches but I still don't >understand what was meant. The whole "zone" thing is a mystery. Worse >yet, the SRM Console user guide doesn't even mention "dynamic" as a >command and the man/help pages in the SRM itself are useless. > >I've reduced the system memory to 128MB (two DIMS) so I can test the >pairs and by accident I figured out which pair is bad (i.e. running >"dynamic -h" by mistake resulted in errors with one pair). > >When you guys use memtest properly, how do you do it? > >Thanks, >JCR My apologies for replying to myself, but I've had a few people ask me off list to make the answer public if I ever manage to figure it out. I've been working on this for a week, reading docs, searching the web and asking around on OpenVMS, FreeBSD, OpenBSD, NetBSD and linux lists and groups. With the help of Graham Burley on comp.os.vms an answer for the problem with the SRM MEMTEST and MEMORY commands failing to run has been found. The WRITTEN and READ portions of the SHOW_STATUS output (above) were telling us that the tests were not actually running. This system probably came out of a "secure" site (i.e. government), so it was sold to me without a hard drive. Though I had installed a new disk, there was no OS or bootable partition on it (an old 4.5GB data drive with an NTFS partition -this becomes relevant later), and obviously, there was nothing for the SRM to boot to in the system. When booting to SRM I got the expected error messages CPU 0 booting (boot dka0.0.0.1009.0 -flags A) block 0 of dka0.0.0.1009.0 is not a valid boot block bootstrap failure Retrying, type ^C to abort... Basically, it's an endless loop of trying to boot to the disk, so I had always just been following instructions and using ^C to get into the SRM console to run the memory tests. This ^C is the main cause of the memory testing problems I mentioned above because by aborting, the system/SRM is _not_ initialized. If you're having problems with either MEMORY or MEMTEST do a ps (or CTRL-T) and look at the status of the MEMTEST lines. If you see them stuck with "WAITING ON" you know your system/SRM was not completely initialized. If you run INIT at this point, you just end up with the same bootstrap failures and ^C issue as before, so you need to change how the system boots before running INIT. >>>set auto_action halt >>>init This gets you to a nice, clean SRM console that's been fully initialized. At this point MEMORY and MEMTEST commands should work properly. You can tell they are working by the WRITTEN and READ portions of the SHOW_STATUS output. If the -p switch has a value of zero, memory tests will run until you tell them to stop with the KILL_DIAGS command. By the way, if you want to see what the "normal" switches are for running MEMTEST you can look at the MEMORY script. >>>cat memory So the system passed it's memory tests and all was well until I rebooted the system. This put me into AlphaBIOS/ARC for some strange reason. I didn't think it was a big deal so I did the usual to switch back to SRM: F2 (Setup) CMOS Setup F6 (Advanced) Console Selection: "UNIX Console SRM" (or "OPENVMS Console SRM") F10 (save) F10 (save) ESC (exit) =20 power cycle =20 =46or some strange reason I ended up in AlphaBIOS/ARC again? This was weird so I did the steps again, cold booted again, and sure enough, it _still_ came up in AlphaBIOS/ARC mode? The reason why the darn thing refused to go into SRM mode is because of that old NTFS partition on the disk. Once I deleted that partition through the AlphaBIOS, I could finally reset the "Console Selection" to SRM and have it work. Hopefully this information will help the next person trying to figure out why their memtest isn't working as expected. Kind Regards, JCR -- | Patches to developers are like lights to moths; | "Ooohhh PATCHES! Look at the pretty patches..." | You can expect them to just circle for a while and even if=20 | they never commit, you'll definitely have their attention.