Date: Sat, 19 Nov 2005 00:56:22 +0100 From: Uwe Doering <gemini@geminix.org> To: Charles Sprickman <spork@bway.net> Cc: stable@freebsd.org Subject: Re: 4.8 "alternate system clock has died" error Message-ID: <437E6A26.6050407@geminix.org> In-Reply-To: <Pine.OSX.4.61.0511181729080.709@charles-sprickmans-computer.local> References: <Pine.OSX.4.61.0511171853010.709@charles-sprickmans-computer.local> <437D91FD.8050809@geminix.org> <Pine.OSX.4.61.0511181729080.709@charles-sprickmans-computer.local>
next in thread | previous in thread | raw e-mail | index | archive | help
Charles Sprickman wrote: > On Fri, 18 Nov 2005, Uwe Doering wrote: >> Charles Sprickman wrote: >> >>> I've been digging through Google for more information on this. I >>> have a 4.8 box that's been up for about 430 days. In the last week >>> or so, top and ps have started reporting all CPU usage numbers as >>> zero, and running "systat -vmstat" results in the message "The >>> alternate system clock has died! Reverting to ``pigs'' display". >>> [...] >> >> We had this once at work, quite a while ago. The "alternate system >> clock" is in fact the Real Time Clock (RTC) on the mainboard. In our >> case we were lucky in that it was just the quartz device that failed >> due to an improperly soldered lead which finally came off. We fixed >> the soldering and the problem was gone. > > Are there any tools to verify that the RTC is working? "systat -vmstat" will show you the interrupt that it drives. In our case it's irq8, which is in fact labeled "rtc". It is supposed to run at 128 Hz. Under load it can drop to some lower value. This is normal. > I don't exactly > understand what the RTC is, but would the machine not be suffering some > other problems if there was an actual hardware failure? Doesn't the > system rely on this to time everything from the processors to memory to > PCI slots and interrupts? No, the RTC drives only the interrupt that is responsible for collecting the CPU usage data. When it fails the CPU usage in "top", "ps" etc. just drops to zero, as you've observed, but the server continues to run. If the failure is permanent the machine refuses to boot, though. At least that's what happened in our case. Apparently the RTC chip is essential to the mainboard's boot sequence. For instance, the initial date and time information comes from this chip. On the other hand, if a reset corrects the problem then the RTC chip probably got hung, or there is a problem with the interrupt controller it is connected to. On a properly working mainboard this shouldn't happen, of course. > Is there any simple way to figure out if this is hardware or software? I don't know of any. However, we run FreeBSD almost since 4.0, on various mainboards, UP and SMP, and we've never seen these symptoms but in this one case mentioned above. So I suppose it's not a kernel bug. I haven't looked at the PR database, though. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?437E6A26.6050407>