From owner-freebsd-current@freebsd.org Fri Mar 17 20:33:28 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56B1ED10A14 for ; Fri, 17 Mar 2017 20:33:28 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1a.eu.mailhop.org (outbound1a.eu.mailhop.org [52.58.109.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E528A1AB2 for ; Fri, 17 Mar 2017 20:33:27 +0000 (UTC) (envelope-from ian@freebsd.org) X-MHO-User: f1c055a3-0b50-11e7-b96d-2378c10e3beb X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 73.78.92.27 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [73.78.92.27]) by outbound1.eu.mailhop.org (Halon) with ESMTPSA id f1c055a3-0b50-11e7-b96d-2378c10e3beb; Fri, 17 Mar 2017 20:33:15 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id v2HKXDo1009605; Fri, 17 Mar 2017 14:33:13 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: <1489782793.40576.185.camel@freebsd.org> Subject: Re: ntpd dies nightly on a server with jails From: Ian Lepore To: Don Lewis , ohartmann@walstatt.org Cc: Cy.Schubert@komquats.com, freebsd-current@freebsd.org Date: Fri, 17 Mar 2017 14:33:13 -0600 In-Reply-To: <201703172027.v2HKQu1c074111@gw.catspoiler.org> References: <201703172027.v2HKQu1c074111@gw.catspoiler.org> Content-Type: text/plain; charset="ISO-8859-1" X-Mailer: Evolution 3.18.5.1 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2017 20:33:28 -0000 On Fri, 2017-03-17 at 13:26 -0700, Don Lewis wrote: > On 17 Mar, O. Hartmann wrote: > > > > > Just some strange news: > > > > I left the server the whole day with ntpd disabled and I didn't > > watch > > a gain of the RTC by one second, even stressing the machine. > > > > But soon after restarting ntpd, I realised immediately a 30 minutes > > off! This morning, the discrapancy was almost 5 hours - it looked > > more > > like a weird ajustment to another time base than UTC. > > > > Over the weekend I'll leave the server with ntpd disabled and only > > RTC > > running. I've the strange feeling that something is intentionally > > readjusting the ntpd time due to a misconfiguration or a rogue ntp > > server in the X.CC.pool.ntp.org > A ntp should recognize a single bad server and ignore it in favor of  > the other servers that are sane. > > It sounds like something is going off the rails once ntpd starts > calling > adjtime().  What is the output of: > sysctl kern.clockrate > > I'd suggest starting ntpd and running "ntpq -c pe" a few times a > minute > and capturing its output to monitor the status of ntpd as it starts > up > and try to capture things going wrong.   You should probably disable > iburst in ntp.conf to give more visibility in the early startup. > > For the first few minutes ntpd should just be getting reliable > timestamp > info and won't start trying to adjust the clock until it has captured > endough samples and figured out which servers are best.  Then the > behaviour of the offset is the thing to watch.  If the iniital offset > is > large enough, ntpd will step the clock once to get it close to zero, > otherwise it will just use adjtime to slowy push the offset towards > zero.  I think though that you will see the offset start gyrating > madly. > > You might want to set /var/db/ntpd.drift to zero beforehand if there > is > an insane value in there.  If the initial drift value is bogus, will > try > to use it which will push the time offset away from zero so fast that > it > will decide to keep stepping the clock back to zero before it can > capture enough samples from the external servers to determine the > true > local clock drift rate. Do not set ntpd.drift contents to zero.  Delete the file.  There's a huge difference between a file that says the clock is perfect and a missing file which triggers ntpd to do a 15-minute frequency measurement to come up with the initial drift correction. -- Ian