From owner-freebsd-questions@FreeBSD.ORG Wed May 7 04:30:31 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EB6537B401 for ; Wed, 7 May 2003 04:30:31 -0700 (PDT) Received: from relay10.cs.mcgill.ca (relay10.CS.McGill.CA [132.206.3.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id ABA3143F75 for ; Wed, 7 May 2003 04:30:30 -0700 (PDT) (envelope-from andrewb@cs.mcgill.ca) Received: from mail.cs.mcgill.ca (mail.CS.McGill.CA [132.206.51.234]) by relay10.cs.mcgill.ca (Postfix) with ESMTP id 19DD9537329; Wed, 7 May 2003 07:30:30 -0400 (EDT) Received: from mail.cs.mcgill.ca (localhost [127.0.0.1]) by mail.cs.mcgill.ca (Postfix) with SMTP id 02D0822; Wed, 7 May 2003 07:30:30 -0400 (EDT) Received: from 65.94.115.61 (SquirrelMail authenticated user andrewb) by mail.cs.mcgill.ca with HTTP; Wed, 7 May 2003 07:30:30 -0400 (EDT) Message-ID: <1152.65.94.115.61.1052307030.squirrel@mail.cs.mcgill.ca> In-Reply-To: <44n0hzptq0.fsf@be-well.ilk.org> References: <1860.132.206.2.68.1051886051.squirrel@mail.cs.mcgill.ca> <44n0hzptq0.fsf@be-well.ilk.org> Date: Wed, 7 May 2003 07:30:30 -0400 (EDT) From: "Andrew Bogecho" To: "Lowell Gilbert" User-Agent: SquirrelMail/1.4.0 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 X-Priority: 3 Importance: Normal cc: freebsd-questions@freebsd.org Subject: Re: 4.8-RELEASE problems X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 May 2003 11:30:31 -0000 Hello all, After very little sleep and many hours wasted, this turned out to be a memory problem. We resolved this by having only one stick in and trying a make installworld, this action rebooted the machine with each stick other than the first. We have since replaced the memory and the machine has been running fine for 2 days now.. Thank you for your input Lowell. To all a nice day. Andrew. > "Andrew Bogecho" writes: > >> I had initially suspected bad interaction with the new raid card so I >> removed the card, but still had the same problem when using a local >> disk. > > Hmm. Is there any kind of power-saving functionality enabled in the BIOS? > >> I then run memtest from the ports and got the following errors on the >> "first" run: >> >> Test 15: Walking Ones: Testing... 47 >> FAILURE: 0x00020000 != 0x00010000 at offset 0x01efcb30. >> Skipping to next test... >> Test 16: Walking Zeroes: Testing... 52 >> FAILURE: 0xffffefff != 0xfffff7ff at offset 0x0101bbc0. >> Skipping to next test... >> >> But, no errors for any of the continuing runs. Is memory a problem here? > > That does look suspicious, all right. You could try running memtest > again from time to time. > >> I had initially installed FreeBSD 5.0-RELEASE, that run very well, but >> nis >> and amd would die every 24 hours. As these were "very" necessary >> services, >> I decided to go back to 4.x. On 5.0-RELEASE there were no reboots at >> all. > > You could always try -CURRENT, I suppose. A little risky, but > probably less so for production use than 5.0-RELEASE was. > >> How should I proceed now? I am thinking of maybe only running a single >> CPU >> kernel to see if that runs better. > > Not likely to matter, but worth a try, anyway. > > You could run a low-priority CPU-hogging job (or several) to see if > it's really connected to low usage levels, or if it's actually > time-sensitive. > >