From owner-freebsd-stable@FreeBSD.ORG Fri May 11 22:03:26 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 85C5416A400 for ; Fri, 11 May 2007 22:03:26 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 6FDF113C459 for ; Fri, 11 May 2007 22:03:26 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.8/8.13.7) with ESMTP id l4BM3Qce068013 for ; Fri, 11 May 2007 15:03:26 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.13.8/8.13.4/Submit) id l4BM3QqY068012; Fri, 11 May 2007 15:03:26 -0700 (PDT) Date: Fri, 11 May 2007 15:03:26 -0700 (PDT) From: Matthew Dillon Message-Id: <200705112203.l4BM3QqY068012@apollo.backplane.com> To: freebsd-stable@freebsd.org References: <20070510.225643.-713548429.imp@bsdimp.com> <200705111011.l4BABTfh061274@lurza.secnetix.de> <20070511195829.GM826@turion.vk2pj.dyndns.org> Subject: Re: clock problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2007 22:03:26 -0000 Another idea to help track down timebase problems. Port dntpd to FreeBSD. You need like three sysctls (because the ntp API and the original sysctl API are both insufficient). Alternatively you could probably hack dntpd to run in debug mode without having to implement any new sysctls, as long as you be sure to clean out any active kernel timebase adjustments in the kernel before you run it. Here's some sample output: http://apollo.backplane.com/DFlyMisc/dntpd.sample01.txt Dntpd in debug mode will print out the results from two staggered continuously running linear regressions (resets after 30 samples, staggered by 15 samples). For anyone who understands how linear regressions work, finding kernel timekeeping bugs is really easy with this sort of output. You get the slope, y-intercept, correlation, and standard deviation, and then you get calculated frequency drift and time offset based on those numbers. The correlation is accurate after around 10 samples. Note that frequency drift calculations require longer intervals to get better results. The forced 30 second interval set in the sample output is way too short, hence the errors (it has to be in 90th percentile to even have a chance of producing a reasonable PPM calculation). But also remember we are talking parts per million here. If you throw away iteration numbers < 15 or so you will get very nice output and kernel bugs will show up in fairly short order. Kernel bugs will show up as non-trivial y-intercept calculations over multiple samples, large jumps in the offset, inability to get a good correlation (provisio: sample interval has to be at least 120 seconds, not the 30 in my example), and so on and so forth. Also be sure to use a locked ntp source, otherwise running corrections on the source will show up as problems in the debug output. ntp.pool.org is usually good enough. It's fun checking various time sources with an idle box with a good timebase. hhahahhaha. OMG. -Matt Matthew Dillon