From owner-freebsd-questions@FreeBSD.ORG Fri May 2 07:34:13 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 57EB137B401 for ; Fri, 2 May 2003 07:34:13 -0700 (PDT) Received: from relay10.cs.mcgill.ca (relay10.CS.McGill.CA [132.206.3.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E03943FAF for ; Fri, 2 May 2003 07:34:12 -0700 (PDT) (envelope-from andrewb@cs.mcgill.ca) Received: from mail.cs.mcgill.ca (mail.CS.McGill.CA [132.206.51.234]) by relay10.cs.mcgill.ca (Postfix) with ESMTP id 8B61F536FFD for ; Fri, 2 May 2003 10:34:11 -0400 (EDT) Received: from mail.cs.mcgill.ca (localhost [127.0.0.1]) by mail.cs.mcgill.ca (Postfix) with SMTP id 63AD23A for ; Fri, 2 May 2003 10:34:11 -0400 (EDT) Received: from 132.206.2.68 (SquirrelMail authenticated user andrewb) by mail.cs.mcgill.ca with HTTP; Fri, 2 May 2003 10:34:11 -0400 (EDT) Message-ID: <1860.132.206.2.68.1051886051.squirrel@mail.cs.mcgill.ca> Date: Fri, 2 May 2003 10:34:11 -0400 (EDT) From: "Andrew Bogecho" To: freebsd-questions@freebsd.org User-Agent: SquirrelMail/1.4.0 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 X-Priority: 3 Importance: Normal Subject: 4.8-RELEASE problems X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 May 2003 14:34:13 -0000 Hello all, I am currently running 4.8-RELEASE on a dual AMD Athlon MP 2400+, however it seems to randomly reboot at night. The heavy loads on it are during the day, but the reboots strangely occur when the loads are low. Evenings and weekend afternoons. There are no logs or error messages that are produced. I had been monitoring cpu temperature and that is fine. The output from vmstat in the last few seconds before reboot was: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad0 md0 in sy cs us sy id 1 0 0 188088 1270192 19 0 0 0 149 0 0 0 241 94 29 0 1 98 0 0 0 188088 1270192 5 0 0 0 0 0 0 0 242 40 13 0 1 99 0 0 0 188088 1270192 7 0 0 0 0 0 0 0 243 71 15 0 2 98 0 0 0 188088 1270376 5 0 0 23 46 0 0 0 283 26 10 0 2 98 0 0 0 188088 1270376 5 0 0 0 0 0 0 0 237 53 14 0 1 99 0 0 0 188088 1270360 10 0 0 0 14 0 24 0 279 1472 455 0 3 97 1 0 0 188088 1270360 5 0 0 0 0 0 0 0 238 73 16 0 1 99 0 0 0 188052 1270364 9 0 0 0 6 0 8 0 257 105 21 0 2 98 0 0 0 188052 1270364 5 0 0 0 0 0 0 0 238 57 12 0 1 98 0 0 0 187176 1270956 12 0 0 0 149 0 0 0 236 56 18 0 1 99 0 0 0 187176 1270956 5 0 0 0 4 0 6 0 243 63 11 0 1 99 0 0 0 187176 1270956 5 0 0 0 0 0 0 0 235 25 8 0 1 99 0 0 0 187176 1270956 5 0 0 0 0 0 0 0 239 66 11 0 2 98 0 0 0 187176 1270956 5 0 0 0 0 0 0 0 236 25 7 0 1 99 0 0 0 263408 1243948 7487 0 0 0 359 0 4 0 356 8521 733 25 24 51 0 0 0 260424 1243948 5 0 0 0 0 0 0 0 239 61 13 0 1 99 The output from iostat in the last few seconds before reboot was: tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 0 0.00 0 0.00 9.25 8 0.07 0.00 0 0.00 0 0 1 0 99 0 71 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0 0 1 0 98 0 0 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 0 70 0.00 0 0.00 16.00 6 0.09 0.00 0 0.00 0 0 1 0 99 0 0 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 0 71 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0 0 1 0 98 0 0 0.00 0 0.00 10.00 4 0.04 0.00 0 0.00 8 0 15 1 76 0 71 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 17 0 10 0 73 0 0 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 I had initially suspected bad interaction with the new raid card so I removed the card, but still had the same problem when using a local disk. I then run memtest from the ports and got the following errors on the "first" run: Test 15: Walking Ones: Testing... 47 FAILURE: 0x00020000 != 0x00010000 at offset 0x01efcb30. Skipping to next test... Test 16: Walking Zeroes: Testing... 52 FAILURE: 0xffffefff != 0xfffff7ff at offset 0x0101bbc0. Skipping to next test... But, no errors for any of the continuing runs. Is memory a problem here? I had initially installed FreeBSD 5.0-RELEASE, that run very well, but nis and amd would die every 24 hours. As these were "very" necessary services, I decided to go back to 4.x. On 5.0-RELEASE there were no reboots at all. How should I proceed now? I am thinking of maybe only running a single CPU kernel to see if that runs better. The machine had very high loads throughout the day, and has no problems. It is only during quite times that it seems to either freeze or reboot. In the frozen state, there is no useful messages at the console, and no keyboard input is recognized (including CTRL-ALT-DELETE). I have to then physically power it off, then on. It has died without fail every night after 11:30pm, and if the fsck does not fail, it dies again around 3:00 am. Any help would be appreciated. Let me know if you need more info. Thank you for your time. Andrew.