From owner-freebsd-stable@freebsd.org Wed Nov 4 16:15:14 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CE00A26120; Wed, 4 Nov 2015 16:15:14 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03237120A; Wed, 4 Nov 2015 16:15:13 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3nrY3f0DmXz1KF; Wed, 4 Nov 2015 17:15:10 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1446653701; x=1449245702; bh=R/E qCjLfDRbb1JG/+3u3oCZRm8Ct7G80LTS7xaHc/IM=; b=BNn05J52o2b6VfCl9+6 tyvRy4+nGOjvKLYwkqyEMdzhHCNq51deBxT++Q+9cYnwICeC5Rq+7yaxKL+QpNEr zeYFemRclvaXiKeqb92KAn7dEqUrO/lyqLaSLKTMUeyOdpGixlUh/p6/kOBXbMbM dcS+oCHV0Mqro8+x3VIQp4vU= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id Dfo489RngLgM; Wed, 4 Nov 2015 17:15:01 +0100 (CET) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3nrY3T4DMvz1KC; Wed, 4 Nov 2015 17:15:01 +0100 (CET) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3nrY3T1nsDzlg; Wed, 4 Nov 2015 17:15:01 +0100 (CET) Received: from neli.ijs.si (2001:1470:ff80:88:21c:c0ff:feb1:8c91) by nabiralnik.ijs.si with HTTP (HTTP/1.1 POST); Wed, 04 Nov 2015 17:15:01 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Wed, 04 Nov 2015 17:15:01 +0100 From: Mark Martinec To: freebsd-stable@freebsd.org, current@freebsd.org Subject: Re: Segmentation fault running ntpd Organization: Jozef Stefan Institute In-Reply-To: <20151101093116.GA5457@voyager> References: <20150718120956.GC1155@albert.catwhisker.org> <86pozwbvds.fsf@desk.des.no> <20151030113449.GF13438@albert.catwhisker.org> <20151101093116.GA5457@voyager> Message-ID: <9145a2d0228d9d025b2b0b6b5612726c@mailbox.ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.1.3 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2015 16:15:14 -0000 Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd crashes (apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm). Thanks!!! Mark On 2015-11-01 10:31, Andre Albsmeier wrote: > On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote: >> Not sure if it's the same issue, but it sure looks like it is. >> >> I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5 >> to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just >> replaced the /usr/sbin/ntpd with a new one; then I restarted >> the ntpd. >> >> On all host but one this was successful: the new ntpd starts >> fine and works normally. But on one of these machines the >> ntpd process immediately crashes with SIGSEGV. That machine >> has an Intel Xeon cpu. It is not apparent to me in what way >> this machine differs from others, > > I'll add my observations here: > > I am using an ntp.conf with a single server entry: > > server ntp.some.domain.org > > ntp.some.domain.org is a CNAME pointing to gate.some.domain.org > and the latter contains an A record pointing to 192.168.128.1. > > After updating 9.3-STABLE to the latest version (one which includes ntp > 4.2.8p4), ntpd crashes: > > Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal > 11 > > This happens in line 871 of ntpd.c where mlockall() is called: > > && 0 != mlockall(MCL_CURRENT|MCL_FUTURE)) > > It does NOT crash with MCL_FUTURE only. > It does crash with MCL_CURRENT only. > > When adding > > rlimit memlock -1 > > to ntpd.conf it does NOT crash (as mlockall() won't be called anymore). > > When specifying the IP address (192.168.128.1) as the server it > does NOT crash. > > When specifying gate.some.domain.org as the server it also does > NOT crash. tcpdump shows in this case: > > 09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A? > gate.some.domain.org. (41) > 09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0 > A 192.168.128.1 (71) > 09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+ > AAAA? gate.some.domain.org. (41) > 09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 > (88) > > When reverting the server entry back to ntp.some.domain.org > it crashes and tcpdump shows: > > 09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A? > ntp.some.domain.org. (40) > 09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768* > 2/1/0 CNAME gate.some.domain.org., A 192.168.128.1 (89) > 09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+ > AAAA? ntp.some.domain.org. (40) > 09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808* > 1/1/0 CNAME gate.some.domain.org. (106) > > The probability for crashing increases with the speed and the > number of cores of the machine: On my old single-core Pentiums > it never crashes, on my quad-cores i7-3770K it always crashes. > > The (asynchronous) resolving of the names start in line 3876 of > ntp_config.c: > > getaddrinfo_sometime(curr_peer->addr->address, > > If we put the mlockall() call directly before this line, the > crash is gone. > > Maybe you want to play around with rlimit, CNAMES, IPs and > so on... > > -Andre > > Anyone else seeing this? >> 2015-10-30 12:34, je David Wolfskill napisal >> > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote: >> >> David Wolfskill writes: >> >> > ... >> >> > bound to 172.17.1.245 -- renewal in 43200 seconds. >> >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) >> >> > Starting Network: lo0 em0 iwn0 lagg0. >> >> > ... >> >> >> >> Did you find a solution? I'm wondering if the ntpd problems people >> >> are >> >> reporting on freebsd-security@ are related. I vaguely recall hearing >> >> that this had been traced to a pthread bug, but can't find anything >> >> about it in commit logs or mailing list archives. >> >> .... >> > >> > I don't recall finding "a solution" per se; that said, I also don't >> > recall seeing an occurrence of the above for enough time that I'm not >> > sure when I sent that message. :-} >> > >> > As a reality check: >> > >> > g1-252(11.0-C)[1] ls -lT /*.core >> > -rw-r--r-- 1 root wheel 13783040 Aug 18 04:19:03 2015 /ntpd.core >> > g1-252(11.0-C)[2] >> > >> > So -- among other points -- my last sighting of whatever was causing >> > that was the day I built: >> > >> > FreeBSD 11.0-CURRENT #157 r286880M/286880:1100079: Tue Aug 18 >> > 04:45:25 PDT 2015 >> > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 >> > >> > Note that the machines where I run head get updated daily (unless >> > there's enough of a problem with head that I can't build it or can't >> > boot it (and I'm unable to circumvent the issue within a reasonable >> > time)) -- and while I do attempt to run ntpd on the machines, the above >> > failure is more "annoying" than "crippling" in my particular case. >> > >> > And I'm presently running: >> > >> > FreeBSD 11.0-CURRENT #227 r290138M/290138:1100084: Thu Oct 29 >> > 05:12:58 PDT 2015 >> > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 >> > >> > and building head @r290190 as I type. >> > >> > And FWIW, I *suspect* that one of the issues involved (in my case) >> > was a ... lack of determinism ... in events involving getting the >> > (wireless) network connectivity into a usable state as part of the >> > initial transition to multi-user mode. (I only have evidence at >> > the moment of the issue on my laptop; my build machine, which only >> > uses a wired NIC, has no /ntpd.core file. It and my laptop are updated >> > pretty much in lock-step; it runs a completely GENERIC kernel, while >> > the laptop runs a modestly customized one based on GENERIC.) >> > >> > Peace, >> > david >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org"