From owner-freebsd-current@freebsd.org Sun Nov 1 09:34:22 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C61F6A23203 for ; Sun, 1 Nov 2015 09:34:22 +0000 (UTC) (envelope-from andre@fbsd.ata.myota.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A56431DC8 for ; Sun, 1 Nov 2015 09:34:22 +0000 (UTC) (envelope-from andre@fbsd.ata.myota.org) Received: by mailman.ysv.freebsd.org (Postfix) id A29A2A23201; Sun, 1 Nov 2015 09:34:22 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88B06A231FF; Sun, 1 Nov 2015 09:34:22 +0000 (UTC) (envelope-from andre@fbsd.ata.myota.org) Received: from mail.myota.org (mail.myota.org [85.10.206.105]) by mx1.freebsd.org (Postfix) with ESMTP id 35C8D1DC5; Sun, 1 Nov 2015 09:34:21 +0000 (UTC) (envelope-from andre@fbsd.ata.myota.org) Received: from x55b58667.dyn.telefonica.de (x55b58667.dyn.telefonica.de [85.181.134.103]) (authenticated bits=128) by mail.myota.org (8.15.2/8.15.2) with ESMTPA id tA19VGs5052335; Sun, 1 Nov 2015 10:31:18 +0100 (CET) (envelope-from andre@fbsd.ata.myota.org) Received: from stationary.client ([192.168.128.2]) by gate.local (8.15.2/8.15.2) with ESMTP id tA19VGUn048838; Sun, 1 Nov 2015 10:31:16 +0100 (CET) (envelope-from andre@fbsd.ata.myota.org) Received: from submit.client ([127.0.0.1]) by voyager.local (8.15.2/8.15.2) with ESMTP id tA19VGOC005493; Sun, 1 Nov 2015 10:31:16 +0100 (CET) (envelope-from andre@fbsd.ata.myota.org) Received: (from user@localhost) by voyager.local (8.15.2/8.15.2/Submit) id tA19VG7N005492; Sun, 1 Nov 2015 10:31:16 +0100 (CET) (envelope-from andre@fbsd.ata.myota.org) Date: Sun, 1 Nov 2015 10:31:16 +0100 From: Andre Albsmeier To: Mark Martinec Cc: freebsd-stable@freebsd.org, current@freebsd.org, andre@fbsd.ata.myota.org Subject: Re: Segmentation fault running ntpd Message-ID: <20151101093116.GA5457@voyager> References: <20150718120956.GC1155@albert.catwhisker.org> <86pozwbvds.fsf@desk.des.no> <20151030113449.GF13438@albert.catwhisker.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Echelon: fraud, FBI, United, USCODE, PGP X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Not delayed on 85.10.206.105, ACL: AUTH(59), Origin: DE, OS: FreeBSD 9.x or newer X-Greylist: Not delayed on 192.168.128.1, ACL: RFC_Nets(54), Origin: , OS: unknown X-Virus-Scanned: clamav-milter 0.98.7 at colo X-Virus-Status: Clean X-Mailman-Approved-At: Sun, 01 Nov 2015 12:46:01 +0000 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Nov 2015 09:34:22 -0000 On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote: > Not sure if it's the same issue, but it sure looks like it is. > > I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5 > to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just > replaced the /usr/sbin/ntpd with a new one; then I restarted > the ntpd. > > On all host but one this was successful: the new ntpd starts > fine and works normally. But on one of these machines the > ntpd process immediately crashes with SIGSEGV. That machine > has an Intel Xeon cpu. It is not apparent to me in what way > this machine differs from others, I'll add my observations here: I am using an ntp.conf with a single server entry: server ntp.some.domain.org ntp.some.domain.org is a CNAME pointing to gate.some.domain.org and the latter contains an A record pointing to 192.168.128.1. After updating 9.3-STABLE to the latest version (one which includes ntp 4.2.8p4), ntpd crashes: Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal 11 This happens in line 871 of ntpd.c where mlockall() is called: && 0 != mlockall(MCL_CURRENT|MCL_FUTURE)) It does NOT crash with MCL_FUTURE only. It does crash with MCL_CURRENT only. When adding rlimit memlock -1 to ntpd.conf it does NOT crash (as mlockall() won't be called anymore). When specifying the IP address (192.168.128.1) as the server it does NOT crash. When specifying gate.some.domain.org as the server it also does NOT crash. tcpdump shows in this case: 09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A? gate.some.domain.org. (41) 09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0 A 192.168.128.1 (71) 09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+ AAAA? gate.some.domain.org. (41) 09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 (88) When reverting the server entry back to ntp.some.domain.org it crashes and tcpdump shows: 09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A? ntp.some.domain.org. (40) 09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768* 2/1/0 CNAME gate.some.domain.org., A 192.168.128.1 (89) 09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+ AAAA? ntp.some.domain.org. (40) 09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808* 1/1/0 CNAME gate.some.domain.org. (106) The probability for crashing increases with the speed and the number of cores of the machine: On my old single-core Pentiums it never crashes, on my quad-cores i7-3770K it always crashes. The (asynchronous) resolving of the names start in line 3876 of ntp_config.c: getaddrinfo_sometime(curr_peer->addr->address, If we put the mlockall() call directly before this line, the crash is gone. Maybe you want to play around with rlimit, CNAMES, IPs and so on... -Andre Anyone else seeing this? > 2015-10-30 12:34, je David Wolfskill napisal > > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote: > >> David Wolfskill writes: > >> > ... > >> > bound to 172.17.1.245 -- renewal in 43200 seconds. > >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) > >> > Starting Network: lo0 em0 iwn0 lagg0. > >> > ... > >> > >> Did you find a solution? I'm wondering if the ntpd problems people > >> are > >> reporting on freebsd-security@ are related. I vaguely recall hearing > >> that this had been traced to a pthread bug, but can't find anything > >> about it in commit logs or mailing list archives. > >> .... > > > > I don't recall finding "a solution" per se; that said, I also don't > > recall seeing an occurrence of the above for enough time that I'm not > > sure when I sent that message. :-} > > > > As a reality check: > > > > g1-252(11.0-C)[1] ls -lT /*.core > > -rw-r--r-- 1 root wheel 13783040 Aug 18 04:19:03 2015 /ntpd.core > > g1-252(11.0-C)[2] > > > > So -- among other points -- my last sighting of whatever was causing > > that was the day I built: > > > > FreeBSD 11.0-CURRENT #157 r286880M/286880:1100079: Tue Aug 18 > > 04:45:25 PDT 2015 > > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > > > > Note that the machines where I run head get updated daily (unless > > there's enough of a problem with head that I can't build it or can't > > boot it (and I'm unable to circumvent the issue within a reasonable > > time)) -- and while I do attempt to run ntpd on the machines, the above > > failure is more "annoying" than "crippling" in my particular case. > > > > And I'm presently running: > > > > FreeBSD 11.0-CURRENT #227 r290138M/290138:1100084: Thu Oct 29 > > 05:12:58 PDT 2015 > > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > > > > and building head @r290190 as I type. > > > > And FWIW, I *suspect* that one of the issues involved (in my case) > > was a ... lack of determinism ... in events involving getting the > > (wireless) network connectivity into a usable state as part of the > > initial transition to multi-user mode. (I only have evidence at > > the moment of the issue on my laptop; my build machine, which only > > uses a wired NIC, has no /ntpd.core file. It and my laptop are updated > > pretty much in lock-step; it runs a completely GENERIC kernel, while > > the laptop runs a modestly customized one based on GENERIC.) > > > > Peace, > > david > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Micro$oft: Which virus will you get today?