From owner-freebsd-questions@FreeBSD.ORG Sun Jul 6 11:10:23 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90F0337B401 for ; Sun, 6 Jul 2003 11:10:23 -0700 (PDT) Received: from mta6.adelphia.net (mta6.adelphia.net [64.8.50.190]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE75C43FDF for ; Sun, 6 Jul 2003 11:10:22 -0700 (PDT) (envelope-from wmoran@potentialtech.com) Received: from potentialtech.com ([24.53.179.151]) by mta6.adelphia.net (InterMail vM.5.01.05.32 201-253-122-126-132-20030307) with ESMTP id <20030706181022.RCGN10267.mta6.adelphia.net@potentialtech.com>; Sun, 6 Jul 2003 14:10:22 -0400 Message-ID: <3F08660D.7010105@potentialtech.com> Date: Sun, 06 Jul 2003 14:10:21 -0400 From: Bill Moran User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030429 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adam References: <1057511651.581.27.camel@elwood> In-Reply-To: <1057511651.581.27.camel@elwood> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: FreeBSD-Questions Subject: Re: More hardware problems (advice needed) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jul 2003 18:10:23 -0000 Adam wrote: > My main FreeBSD (4.8) box has died on me again, and I'm 99% certain it's > due to hardware failure. However, I'm having a very hard time > determining what hardware is going bad, due to the nature of the crash. > > Let me describe the scenario. > > I was working on the machine, not doing anything out of the ordinary. > All of a sudden, my mouse stopped responding. I thought maybe moused had > crashed, so I did 'ps -aux |fgrep moused'. This caused ps to segfault, > which caused me to nearly soil myself. So, I decided to quickly kill all > my apps and exit X so I could reboot. When I closed X, I noticed a lot > of errors on my console about dc0 (my Linksys NIC interface, external) > having underruns, and that ad2 was timed out. I also noticed that my LAN > connection to my other box was dead. I tried to reboot, and all went > well until it got to the 'Rebooting...', at which point it hung. I > waited for 10+ minutes, thinking it might eventually reboot, but it was > stuck, so I turned it off. > > When I powered back up, I got tons of errors that the kernel couldn't be > loaded, and I couldn't even get into single-user mode. So, I made a > fixit floppy and fired up the fixit shell, and start poking around to > see what happened. I was able to mount ad3 and ad2 just fine, but > mounting ad0 caused fixit to panic and the machine reboot. > > So, this is where I am now. For those of you that remember, I had > another crash & burn experience on that machine a couple months ago, > where the machine just suddenly froze completely and my ad0 was trashed > when I boot back up. That time, I didn't have backups. This time, I do. > But, before I work on that computer again, I think I need to replace > some hardware. > > I've heard pretty good arguments for both the ad0 drive (Western Digital > 120gb, 2mb cache), and for the motherboard/cpu (Asus A7V266-E, Athlon > 1600+). I used memtest86 to test the RAM, which came up clean. > > I doubt if its a power problem, since I've got a very nice case (Antec > 1080, 400+ watts). Also, I've got another machine in my apartment that > hasn't experienced any weird problems like this. > > The CPU might be overheating, but its hard to tell. Roughly 5 minutes > after the crash, I checked the CPU temperature from the BIOS, which > registered 63C for the CPU. I have no idea how hot the CPU was at the > time of the crash, but it definitely had to have cooled off a bit in > those 5 minutes. Sounds like a HDD going ... I had a similar sceneria a few months ago and it was the HDD. You could get a FreeSBIE CD, boot it and run cpuburn to test the CPU. > I don't have enough $$ to replace all the hardware, so I'd like some > expert advice as to what is the most likely culprit. I don't know if > I'll be able to convince any of Asus, AMD, or Western Digital to give me > an RMA number, but I can try (also would like some advice on this to > maximize my chances). -- Bill Moran Potential Technologies http://www.potentialtech.com