Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 May 2003 10:34:11 -0400 (EDT)
From:      "Andrew Bogecho" <andrewb@cs.mcgill.ca>
To:        freebsd-questions@freebsd.org
Subject:   4.8-RELEASE problems
Message-ID:  <1860.132.206.2.68.1051886051.squirrel@mail.cs.mcgill.ca>

next in thread | raw e-mail | index | archive | help
Hello all,

I am currently running 4.8-RELEASE on a dual AMD Athlon MP 2400+, however
it seems to randomly reboot at night. The heavy loads on it are during the
day, but the reboots strangely occur when the loads are low. Evenings and
weekend afternoons.

There are no logs or error messages that are produced. I had been
monitoring cpu temperature and that is fine. The output from vmstat in the
last few seconds before reboot was:

 procs      memory      page                    disks     faults      cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr ad0 md0   in   sy  cs us sy id
 1 0 0  188088 1270192   19   0   0   0 149   0   0   0  241   94  29  0 
1 98
 0 0 0  188088 1270192    5   0   0   0   0   0   0   0  242   40  13  0 
1 99
 0 0 0  188088 1270192    7   0   0   0   0   0   0   0  243   71  15  0 
2 98
 0 0 0  188088 1270376    5   0   0  23  46   0   0   0  283   26  10  0 
2 98
 0 0 0  188088 1270376    5   0   0   0   0   0   0   0  237   53  14  0 
1 99
 0 0 0  188088 1270360   10   0   0   0  14   0  24   0  279 1472 455  0 
3 97
 1 0 0  188088 1270360    5   0   0   0   0   0   0   0  238   73  16  0 
1 99
 0 0 0  188052 1270364    9   0   0   0   6   0   8   0  257  105  21  0 
2 98
 0 0 0  188052 1270364    5   0   0   0   0   0   0   0  238   57  12  0 
1 98
 0 0 0  187176 1270956   12   0   0   0 149   0   0   0  236   56  18  0 
1 99
 0 0 0  187176 1270956    5   0   0   0   4   0   6   0  243   63  11  0 
1 99
 0 0 0  187176 1270956    5   0   0   0   0   0   0   0  235   25   8  0 
1 99
 0 0 0  187176 1270956    5   0   0   0   0   0   0   0  239   66  11  0 
2 98
 0 0 0  187176 1270956    5   0   0   0   0   0   0   0  236   25   7  0 
1 99
 0 0 0  263408 1243948 7487   0   0   0 359   0   4   0  356 8521 733 25
24 51
 0 0 0  260424 1243948    5   0   0   0   0   0   0   0  239   61  13  0 
1 99

The output from iostat in the last few seconds before reboot was:

 tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   0    0  0.00   0  0.00   9.25   8  0.07   0.00   0  0.00   0  0  1  0 99
   0   71  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  1  0 98
   0    0  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  1  0 99
   0   70  0.00   0  0.00  16.00   6  0.09   0.00   0  0.00   0  0  1  0 99
   0    0  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  1  0 99
   0   71  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  1  0 98
   0    0  0.00   0  0.00  10.00   4  0.04   0.00   0  0.00   8  0 15  1 76
   0   71  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00  17  0 10  0 73
   0    0  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   0  0  1  0 99

I had initially suspected bad interaction with the new raid card so I
removed the card, but still had the same problem when using a local disk.

I then run memtest from the ports and got the following errors on the
"first" run:

  Test 15:          Walking Ones:  Testing...  47
FAILURE: 0x00020000 != 0x00010000 at offset 0x01efcb30.
Skipping to next test...
  Test 16:        Walking Zeroes:  Testing...  52
FAILURE: 0xffffefff != 0xfffff7ff at offset 0x0101bbc0.
Skipping to next test...

But, no errors for any of the continuing runs. Is memory a problem here?

I had initially installed FreeBSD 5.0-RELEASE, that run very well, but nis
and amd would die every 24 hours. As these were "very" necessary services,
I decided to go back to 4.x. On 5.0-RELEASE there were no reboots at all.

How should I proceed now? I am thinking of maybe only running a single CPU
kernel to see if that runs better.

The machine had very high loads throughout the day, and has no problems.
It is only during quite times that it seems to either freeze or reboot. In
the frozen state, there is no useful messages at the console, and no
keyboard input is recognized (including CTRL-ALT-DELETE). I have to then
physically power it off, then on. It has died without fail every night
after 11:30pm, and if the fsck does not fail, it dies again around 3:00
am.

Any help would be appreciated. Let me know if you need more info.

Thank you for your time.

Andrew.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1860.132.206.2.68.1051886051.squirrel>