Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Nov 2005 19:40:22 -0500 (EST)
From:      Charles Sprickman <spork@fasttrackmonkey.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Uwe Doering <gemini@geminix.org>
Subject:   Re: 4.8 "Alternate system clock has died" error
Message-ID:  <Pine.OSX.4.61.0511211907440.529@white.nat.fasttrackmonkey.com>
In-Reply-To: <200511211149.01165.jhb@freebsd.org>
References:  <Pine.OSX.4.61.0511182152380.2298@gee5.nat.fasttrackmonkey.com> <200511182215.04399.jhb@freebsd.org> <437F79F1.5040706@geminix.org> <200511211149.01165.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Nov 2005, John Baldwin wrote:

> On Saturday 19 November 2005 02:16 pm, Uwe Doering wrote:
>> John Baldwin wrote:
>>> On Friday 18 November 2005 10:05 pm, Charles Sprickman wrote:
>>>> I tried this query on -stable, hoping someone here can help me further
>>>> understand and troubleshoot this.
>>>>
>>>> Reference:
>>>> http://thread.gmane.org/gmane.os.freebsd.stable/32837
>>>>
>>>> In short, top, ps report 0% CPU on all processes as of a few weeks ago.
>>>> "systat -vmstat" hands out the "Alternate system clock has died" error.
>>>>
>>>> Box is running 4.8-p24 and has been up 425 days.  Nothing out of the
>>>> ordinary except for the above symptoms.  In searching the various
>>>> lists/newsgroups, it seems that the other folks with this problem have
>>>> fixed it in various ways:
>>>>
>>>> -early 4.x users referenced a PR that was committed before 4.8
>>>> -some 5.3 users reported this with unknown resolution/cause
>>>> -sending init a HUP was suggested (tried it, no luck)
>>>> -setting kern.timecounter.method: 1 (tried it, no luck)
>>>> -one user seemed to actually have a dead timer
>>>
>>> Actually, there was a patch that was committed in 5.4 and 6.0 for this
>>> issue. You can see the diff here:
>>>
>>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/isa/clock.c.diff?r1=1.
>>> 213&r2=1.214&f=h
>>>
>>> That patch would probably backport to 4.x fairly easily.
>>
>> I just looked at RELENG_4, and yes, backporting should be easy.  Though
>> I haven't tried it yet on our machines.
>>
>> I wonder, however, what's writing to the RTC on a running server.  Could
>> this event perhaps have been triggered by the recent Daylight Saving
>> Time switch?
>
> Yep.  Also, if you are using ntp, then the adjustments to the time are getting
> pushed back to the RTC as well.

I run ntp everywhere.

So it certainly looks easy enough for me to change the first two sections 
of the diff referenced above, but I'm having issues finding that last one 
in cpu_initclocks().  It looks like that section really has changed quite 
a bit. (see v.1.206)

The original PR that references this is against 4.something and only 
patches in one place:

http://www.freebsd.org/cgi/query-pr.cgi?pr=17800

What's my best course of action to try and fix this?  It looks like I can 
take the first two hunks of that cvsweb diff and then add on the one 
liner from the PR, but I have no idea what that's actually doing.  My 
experience with C is limited to making very small changes to existing 
work, and nothing quite as important as this one file appears to be (from 
reading the commit logs on it).

Is there any interest in moving this back to 4-STABLE?

And lastly, is there any snippet of code that can twiddle the clock from 
userspace and determine if it's wedged or dead?

Scheduling a reboot of this machine gets much, much more complicated if I 
need to have another box standing by due to a truly dead timer.

Thanks so much to both of you for your help...

Charles

> -- 
> John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.OSX.4.61.0511211907440.529>