From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 25 15:32:33 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF73F16A4DA for ; Tue, 25 Jul 2006 15:32:33 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (vc4-2-0-87.dsl.netrack.net [199.45.160.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4B36143D45 for ; Tue, 25 Jul 2006 15:32:33 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (localhost.village.org [127.0.0.1] (may be forged)) by harmony.bsdimp.com (8.13.4/8.13.4) with ESMTP id k6PFVvGf018961; Tue, 25 Jul 2006 09:31:57 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Tue, 25 Jul 2006 09:32:15 -0600 (MDT) Message-Id: <20060725.093215.1324585171.imp@bsdimp.com> To: peterjeremy@optushome.com.au From: "M. Warner Losh" In-Reply-To: <20060725075946.GA728@turion.vk2pj.dyndns.org> References: <44C4EB9D.1060106@secnap.net> <20060725075946.GA728@turion.vk2pj.dyndns.org> X-Mailer: Mew version 4.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (harmony.bsdimp.com [127.0.0.1]); Tue, 25 Jul 2006 09:31:58 -0600 (MDT) Cc: scheidell@secnap.net, freebsd-hackers@freebsd.org Subject: Re: FBSD 5.5 and software timers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Jul 2006 15:32:33 -0000 In message: <20060725075946.GA728@turion.vk2pj.dyndns.org> Peter Jeremy writes: : On Mon, 2006-Jul-24 11:47:41 -0400, Michael Scheidell wrote: : >This software timer was resetting a 1 second hardware watchdog timer. : >Every 200ms, I sent a reset to the hardware WDT. : >Everything worked on 5.4, but I am getting failures on 5.5 : : Basically, when you ask for a 200msec delay, the kernel sleeps until : an absolute time. It looks like the handling of absolute time : sleeps across time steps was changed. Unfortunately, both approaches : are equally valid in different circumstances. With libc_r, I've had problems dating back to 3.x with sleeping during a time step. I've not investigated libthr or libpthread. : >It fails within 1 second of getting these types of log entries: : >Jul 23 15:03:42 audit18 ntpd[473]: time reset -2.497234 s : >Jul 23 16:03:56 audit18 ntpd[473]: time reset +1.532401 s : : Rather than focussing on the changed sleep handling, I suggest you : concentrate on fixing your clock: Your system clock should not be : stepping. With time running like the above indicates, you'll never get good, stable performance and you'll note all kinds of anomalous behavior. Until you can fix the above, anything else you do is futile. I'd suggest using a different timecounter. There are often times when this solves problems. : >ntpd using strata 2 ntp server, with 2 other backups. : : I presume the servers are all stable (ie not stepping) and have a : reasonably low delay. If so, I suspect your ntpd PLL has locked up. : I've seen problems with some versions of ntpd that they can lock : at +/-300ppm and just step regularly. I'm not exactly sure what : triggers it but it seems to be exacerbated by noisy time servers : (eg via a heavily loaded network link). A work-around is to delete : ntp.drift and restart ntpd. You might like to enable some of the : ntpd statistics gathering and see if anything anomolous is occurring. ntpd hates time sources (including the local oscillator) that have a frequency error of more than about 200ppm, and clamps it to 300ppm. The drift rate above was 1.532 seeconds in 3606 seconds, or about 425ppm. If your local oscillator exceeds the 300ppm clamping that ntpd does, you'll get wild swings like you are seeing. You might also try starting ntpd with the only step once flag, which will cause the above steps to stop. But it will mean that time won't be well synchronized. Warner