Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Jul 2015 12:48:49 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Will 10.2 also ship with a very stale NTP?
Message-ID:  <1436726929.1334.202.camel@freebsd.org>
In-Reply-To: <20150712183140.GB22240@server.rulingia.com>
References:  <20150710235810.GA76134@rwpc16.gfn.riverwillow.net.au> <20150712032256.GB19305@satori.lan> <20150712050443.GA22240@server.rulingia.com> <20150712154416.b9f3713893fe28bfab1dd4d7@dec.sakura.ne.jp> <CAGMYy3vKEUCD=Ssxt%2B2Vny4eQ7CNQHTxNKncyQnRk5dPQU6ZtA@mail.gmail.com> <20150712184910.2d8d5f085ae659d5b9a2aba0@dec.sakura.ne.jp> <1436715703.1334.193.camel@freebsd.org> <20150712183140.GB22240@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2015-07-13 at 04:31 +1000, Peter Jeremy wrote:
> On 2015-Jul-12 09:41:43 -0600, Ian Lepore <ian@freebsd.org> wrote:
> >And let's all just hope that a week or two of testing is enough when
> >jumping a major piece of software forward several years in its
> >independent evolution.
> 
> Whilst I support John's desire for NTP to be updated, I also do not
> think this is the appropriate time to do so.  That said, the final
> decision is up to re@.
> 
> >The import of 4.2.8p2 several months ago resulted in complete failure of
> >timekeeping on all my arm systems.  Just last week I tracked it down to
> >a kernel bug (which I haven't committed the fix for yet).  While the bug
> >has been in the kernel for years, it tooks a small change in ntpd
> >behavior to trigger it.
> >
> >Granted it's an odd corner-case problem that won't affect most users
> >because they just use the stock ntp.conf file (and it only affects
> >systems that have a large time step due to no battery-backed clock).
> >But it took me weeks to find enough time to track down the cause of the
> >problem.
> 
> I'm not using the stock ntp.conf on my RPis and didn't notice any NTP
> issues.  Are you able to provide more details of either the ntp.conf
> options that trigger the bug or the kernel bug itself?  A quick search
> failed to find anything.
> 

I just committed the kernel fix as r285424; the commit message has some
info on why the new ntpd made the problem visible.

I should have said "stock rc.conf and ntp.conf"... To get the problem to
happen you've got to set rc.conf ntpd_sync_on_start=NO and allow ntpd to
make a large step (-g without -q, or tinker panic 0).  I don't remember
why I had sync on start disabled on most of my arm systems (probably a
one-time experiment that I forgot to undo and it got copied around), but
I suspect most people who don't have battery clocks will have it set to
yes, and that's why nobody else saw this problem.

To me, the problem was mainly illustrative of how a tiny innocuous
change (ntpd making a series of ntp_adjtime() calls in a different, but
still correct, order than it used to) can expose a completely unexpected
longstanding bug in our code.  Gotta wonder if any more of those are
lurking. :/

-- Ian







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1436726929.1334.202.camel>