Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Nov 2015 17:15:01 +0100
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        freebsd-stable@freebsd.org, current@freebsd.org
Subject:   Re: Segmentation fault running ntpd
Message-ID:  <9145a2d0228d9d025b2b0b6b5612726c@mailbox.ijs.si>
In-Reply-To: <20151101093116.GA5457@voyager>
References:  <20150718120956.GC1155@albert.catwhisker.org> <86pozwbvds.fsf@desk.des.no> <20151030113449.GF13438@albert.catwhisker.org> <e7dd89564e34d1cc0b8e61d64f8e1d2b@mailbox.ijs.si> <20151101093116.GA5457@voyager>

next in thread | previous in thread | raw e-mail | index | archive | help
Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd crashes
(apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm).

Thanks!!!

   Mark


On 2015-11-01 10:31, Andre Albsmeier wrote:
> On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote:
>> Not sure if it's the same issue, but it sure looks like it is.
>> 
>> I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5
>> to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just
>> replaced the /usr/sbin/ntpd with a new one; then I restarted
>> the ntpd.
>> 
>> On all host but one this was successful: the new ntpd starts
>> fine and works normally. But on one of these machines the
>> ntpd process immediately crashes with SIGSEGV. That machine
>> has an Intel Xeon cpu. It is not apparent to me in what way
>> this machine differs from others,
> 
> I'll add my observations here:
> 
> I am using an ntp.conf with a single server entry:
> 
> 	server ntp.some.domain.org
> 
> ntp.some.domain.org is a CNAME pointing to gate.some.domain.org
> and the latter contains an A record pointing to 192.168.128.1.
> 
> After updating 9.3-STABLE to the latest version (one which includes ntp
> 4.2.8p4), ntpd crashes:
> 
> Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal 
> 11
> 
> This happens in line 871 of ntpd.c where mlockall() is called:
> 
> 	&& 0 != mlockall(MCL_CURRENT|MCL_FUTURE))
> 
> It does NOT crash with MCL_FUTURE only.
> It does crash with MCL_CURRENT only.
> 
> When adding
> 
> 	rlimit memlock -1
> 
> to ntpd.conf it does NOT crash (as mlockall() won't be called anymore).
> 
> When specifying the IP address (192.168.128.1) as the server it
> does NOT crash.
> 
> When specifying gate.some.domain.org as the server it also does
> NOT crash. tcpdump shows in this case:
> 
> 09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A?
> gate.some.domain.org. (41)
> 09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0
> A 192.168.128.1 (71)
> 09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+
> AAAA? gate.some.domain.org. (41)
> 09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 
> (88)
> 
> When reverting the server entry back to ntp.some.domain.org
> it crashes and tcpdump shows:
> 
> 09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A?
> ntp.some.domain.org. (40)
> 09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768*
> 2/1/0 CNAME gate.some.domain.org., A 192.168.128.1 (89)
> 09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+
> AAAA? ntp.some.domain.org. (40)
> 09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808*
> 1/1/0 CNAME gate.some.domain.org. (106)
> 
> The probability for crashing increases with the speed and the
> number of cores of the machine: On my old single-core Pentiums
> it never crashes, on my quad-cores i7-3770K it always crashes.
> 
> The (asynchronous) resolving of the names start in line 3876 of
> ntp_config.c:
> 
> 	getaddrinfo_sometime(curr_peer->addr->address,
> 
> If we put the mlockall() call directly before this line, the
> crash is gone.
> 
> Maybe you want to play around with rlimit, CNAMES, IPs and
> so on...
> 
> 	-Andre
> 
> Anyone else seeing this?
>> 2015-10-30 12:34, je David Wolfskill napisal
>> > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:
>> >> David Wolfskill <david@catwhisker.org> writes:
>> >> > ...
>> >> > bound to 172.17.1.245 -- renewal in 43200 seconds.
>> >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
>> >> > Starting Network: lo0 em0 iwn0 lagg0.
>> >> > ...
>> >>
>> >> Did you find a solution?  I'm wondering if the ntpd problems people
>> >> are
>> >> reporting on freebsd-security@ are related.  I vaguely recall hearing
>> >> that this had been traced to a pthread bug, but can't find anything
>> >> about it in commit logs or mailing list archives.
>> >> ....
>> >
>> > I don't recall finding "a solution" per se; that said, I also don't
>> > recall seeing an occurrence of the above for enough time that I'm not
>> > sure when I sent that message. :-}
>> >
>> > As a reality check:
>> >
>> > g1-252(11.0-C)[1] ls -lT /*.core
>> > -rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
>> > g1-252(11.0-C)[2]
>> >
>> > So -- among other points -- my last sighting of whatever was causing
>> > that was the day I built:
>> >
>> > FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18
>> > 04:45:25 PDT 2015
>> > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
>> >
>> > Note that the machines where I run head get updated daily (unless
>> > there's enough of a problem with head that I can't build it or can't
>> > boot it (and I'm unable to circumvent the issue within a reasonable
>> > time)) -- and while I do attempt to run ntpd on the machines, the above
>> > failure is more "annoying" than "crippling" in my particular case.
>> >
>> > And I'm presently running:
>> >
>> > FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29
>> > 05:12:58 PDT 2015
>> > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
>> >
>> > and building head @r290190 as I type.
>> >
>> > And FWIW, I *suspect* that one of the issues involved (in my case)
>> > was a ... lack of determinism ... in events involving getting the
>> > (wireless) network connectivity into a usable state as part of the
>> > initial transition to multi-user mode.  (I only have evidence at
>> > the moment of the issue on my laptop; my build machine, which only
>> > uses a wired NIC, has no /ntpd.core file.  It and my laptop are updated
>> > pretty much in lock-step; it runs a completely GENERIC kernel, while
>> > the laptop runs a modestly customized one based on GENERIC.)
>> >
>> > Peace,
>> > david
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to 
>> "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9145a2d0228d9d025b2b0b6b5612726c>