From owner-freebsd-stable@freebsd.org Fri Oct 30 18:48:07 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5506A1F02C; Fri, 30 Oct 2015 18:48:07 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6E0DD1804; Fri, 30 Oct 2015 18:48:07 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3nnXhM0Gp9z1VQ; Fri, 30 Oct 2015 19:48:03 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1446230879; x=1448822880; bh=i3q r+9cu6SOyOhWJJDfqSKQJ1hRSwDNXLO1L85w3vU0=; b=h4PusJ1Xxd3E0Ca4tD/ K0rhYk7xexPTVit0Ef60aR6npOCpMUqMG3Ehr5qGhfuoBkMlNIFDFpf0GY24l2te tufgdXKFgZU5Y+jqHtxL9chkSz+N9v3EC0zV9wSx/rtdnZb03DuvqqRyRecpHhkm AMIvB0w+fxr3GsX0GgBrdIYQ= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id 6qn9fXfNBepo; Fri, 30 Oct 2015 19:47:59 +0100 (CET) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3nnXhH5Pbkz1VN; Fri, 30 Oct 2015 19:47:59 +0100 (CET) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3nnXhH4W4Cz175; Fri, 30 Oct 2015 19:47:59 +0100 (CET) Received: from neli.ijs.si (2001:1470:ff80:88:21c:c0ff:feb1:8c91) by nabiralnik.ijs.si with HTTP (HTTP/1.1 POST); Fri, 30 Oct 2015 19:47:59 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Fri, 30 Oct 2015 19:47:59 +0100 From: Mark Martinec To: freebsd-stable@freebsd.org, current@freebsd.org Subject: Re: Segmentation fault running ntpd Organization: Jozef Stefan Institute In-Reply-To: <20151030113449.GF13438@albert.catwhisker.org> References: <20150718120956.GC1155@albert.catwhisker.org> <86pozwbvds.fsf@desk.des.no> <20151030113449.GF13438@albert.catwhisker.org> Message-ID: X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.1.3 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2015 18:48:07 -0000 Not sure if it's the same issue, but it sure looks like it is. I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5 to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just replaced the /usr/sbin/ntpd with a new one; then I restarted the ntpd. On all host but one this was successful: the new ntpd starts fine and works normally. But on one of these machines the ntpd process immediately crashes with SIGSEGV. That machine has an Intel Xeon cpu. It is not apparent to me in what way this machine differs from others, Played with some variations of ntpd on that host, here are some findings: - the new ntpd (that came with 10.2-RELEASE-p6) runs fine if it does *not* daemonize, i.e. ntpd with an option -n or -d stays attached to a terminal and works fine; the same happens when run under ktrace -d -i ntpd ... it works fine, even when it daemonizes; - the ntpd built from fresh net/ntp-devel behaves exactly the same: crashes on that machine when it daemonizes - a previous ntpd (from 10.2-RELEASE-p5) works fine, so I ended up downgrading ntpd to that previous version on that machine. Also a ntpd from a recent 10-STABLE when copied to that host runs fine there! I haven't tried yet to build it with debugging, or capture a core dump. Puzzling... Mark 2015-10-30 12:34, je David Wolfskill napisal > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote: >> David Wolfskill writes: >> > ... >> > bound to 172.17.1.245 -- renewal in 43200 seconds. >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) >> > Starting Network: lo0 em0 iwn0 lagg0. >> > ... >> >> Did you find a solution? I'm wondering if the ntpd problems people >> are >> reporting on freebsd-security@ are related. I vaguely recall hearing >> that this had been traced to a pthread bug, but can't find anything >> about it in commit logs or mailing list archives. >> .... > > I don't recall finding "a solution" per se; that said, I also don't > recall seeing an occurrence of the above for enough time that I'm not > sure when I sent that message. :-} > > As a reality check: > > g1-252(11.0-C)[1] ls -lT /*.core > -rw-r--r-- 1 root wheel 13783040 Aug 18 04:19:03 2015 /ntpd.core > g1-252(11.0-C)[2] > > So -- among other points -- my last sighting of whatever was causing > that was the day I built: > > FreeBSD 11.0-CURRENT #157 r286880M/286880:1100079: Tue Aug 18 > 04:45:25 PDT 2015 > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > > Note that the machines where I run head get updated daily (unless > there's enough of a problem with head that I can't build it or can't > boot it (and I'm unable to circumvent the issue within a reasonable > time)) -- and while I do attempt to run ntpd on the machines, the above > failure is more "annoying" than "crippling" in my particular case. > > And I'm presently running: > > FreeBSD 11.0-CURRENT #227 r290138M/290138:1100084: Thu Oct 29 > 05:12:58 PDT 2015 > root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > > and building head @r290190 as I type. > > And FWIW, I *suspect* that one of the issues involved (in my case) > was a ... lack of determinism ... in events involving getting the > (wireless) network connectivity into a usable state as part of the > initial transition to multi-user mode. (I only have evidence at > the moment of the issue on my laptop; my build machine, which only > uses a wired NIC, has no /ntpd.core file. It and my laptop are updated > pretty much in lock-step; it runs a completely GENERIC kernel, while > the laptop runs a modestly customized one based on GENERIC.) > > Peace, > david