Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Jul 2015 10:24:11 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        David Wolfskill <david@catwhisker.org>
Cc:        current@freebsd.org
Subject:   Re: Segmentation fault running ntpd
Message-ID:  <1437323051.1334.383.camel@freebsd.org>
In-Reply-To: <20150718120956.GC1155@albert.catwhisker.org>
References:  <20150718120956.GC1155@albert.catwhisker.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2015-07-18 at 05:09 -0700, David Wolfskill wrote:
> Lousy timing (no pun intended -- it's early in the day for me),
> given the recent MFC, but as I was booting my laptop to yesterday's
> head:
> 
> FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127  r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015     root@g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY  amd64
> 
> to build today's head (@r285670; still in progress as I type), I
> happened to note [Oh, great -- we can no longer copy/paste from
> console now??!?  Fine, I'll transcribe by hand.... :-(]:
> 
> ...
> bound to 172.17.1.245 -- renewal in 43200 seconds.
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> Starting Network: lo0 em0 iwn0 lagg0.
> ...
> 
> Trying to examine the /ntpd.core, I see:
> root@g1-245:/ # gdb `which ntpd` ntpd.core 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
> Core was generated by `ntpd'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.5
> Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done.
> Loaded symbols for /lib/libcrypto.so.7
> Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
> Loaded symbols for /lib/libthr.so.3
> Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.7
> Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
> Loaded symbols for /libexec/ld-elf.so.1
> #0  0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
> [New Thread 801c07400 (LWP 100122/<unknown>)]
> [New Thread 801c06400 (LWP 100120/<unknown>)]
> (gdb) bt
> #0  0x00000008011cd6a0 in sbrk () from /lib/libc.so.7
> #1  0x00000008ccbd4f34 in ?? ()
> #2  0x0000000000000005 in ?? ()
> #3  0x0000000801800448 in ?? ()
> #4  0x00000008011ca888 in sbrk () from /lib/libc.so.7
> #5  0x00000008018000c8 in ?? ()
> #6  0x00000008018000c0 in ?? ()
> #7  0x0000000000000208 in ?? ()
> #8  0x0000000801c32fb0 in ?? ()
> #9  0x0000000000000001 in ?? ()
> #10 0x0000000801cc20c8 in ?? ()
> #11 0x0000000000000030 in ?? ()
> #12 0x0000000801cc20c8 in ?? ()
> #13 0x00007fffffffe480 in ?? ()
> #14 0x00000008011cd240 in sbrk () from /lib/libc.so.7
> #15 0x0000000000000280 in ?? ()
> #16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7
> #17 0x00000008018000c0 in ?? ()
> #18 0x0000000801800448 in ?? ()
> #19 0x0000000000000032 in ?? ()
> #20 0x0000000801800458 in ?? ()
> #21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7
> #22 0x0000000801cc2000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---
> #23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7
> #24 0x0000000801cc20d8 in ?? ()
> #25 0x00000000000000a0 in ?? ()
> #26 0x0000000000000208 in ?? ()
> #27 0x00007fffffffe4d0 in ?? ()
> #28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
> Previous frame inner to this frame (corrupt stack?)
> (gdb) 
> 
> which seems... well, not especially useful, as far as I can tell.
> 
> 
> This is (as mentioned above) on my laptop; as such, it is expected to
> "wander" from one network to another.  Accordingly:
> 
> * Since it could be connected to a network I do not control, I use a
>   packet filter (IPFW, in my case) to reduce my exposure from a
>   possibly-hostile network.
> 
> * Rather than enabling ntpd in /etc/rc.conf, I use
>   /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP
>   lease.  (For networks I control, I also set up the DHCP server to
>   advertise what NTP server the DHCP clients should use, but the code in
>   dhclient-exit-hooks merely prefers that, rather han requiring it.)
> 
> * In my world-view -- at least for networks I control -- DNS zone files
>   are the Source of Truth with respect to hostname <-> IP address
>   correspondence, and Dynamic DNS is Evil.  I populate my zone files
>   with appropriate A & PTR records so that every assignable DHCP
>   address has a PTR record, and the hostname to which it points has
>   an A record that points back to that IP address.  Accordingly, I
>   also use /etc/dhclient-exit-hooks so the laptop can find out what
>   its hostname is, and set it accordingly.
> 
> Mind, I've been doing the above for well over a decade, so that doesn't
> qualify as "new."
> 
> And most of the time, it Just Works (which is a significant reason I
> keep doing it).
> 
> A couple of other things that are more recent, and possibly of
> relevance:
> 
> * As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using
>   Link Aggregation in "failover" mode.  In practice, I rarely use
>   the em0 (wired) NIC -- I had originally done that based on a
>   misperception of how I thought things were set up at work, and
>   then just left the configuration alone and relied on the wireless
>   NIC.  (At home, I have things set up so that the failover would
>   work, but doing so would be a little awkward for reasons that
>   aren't relevant here.)
> 
> * I have the laptop configured to run xdm(1)... after the DHCP lease is
>   acquired and the hostname is set.  My ~/.xsession script is set
>   up so it fires up ssh-agent, requests a passphrase, and then
>   (among other things) establishes an SSH session to the "mail hub"
>   at home and re-establish a tmux session where I'm running mutt
>   to handle my email.  I've noticed that in head, these connections
>   sometimes fail to get initialized, and sometimes will time out,
>   while sessions started a few minutes later will have no problem.
>   That seems peculiar, but was sufficiently ... well, "nebulous" that
>   I didn't think it warranted a whine of its own here.  But on the
>   chance that it's related to ntpd giving up the ghost prematurely,
>   it seemed but a reasonable exercise of "Full Disclosure" to mention
>   it in this context -- even though it's also something I've been doing
>   since the (late) 1990s.
> 
> So: Any suggestions for either diagnosing what the root cause is or
> changing the configuration so that the failure no longer occurs?
> 
> Thanks!
> 
> Peace,
> david

Was there anything (at all) in /var/log/messages about ntpd?  Even the
routine messages (such as what interfaces it binds to) might give a bit
of a clue about how far it got in its init before it died. 

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1437323051.1334.383.camel>