Date: Sun, 19 Jul 2015 10:24:11 -0600 From: Ian Lepore <ian@freebsd.org> To: David Wolfskill <david@catwhisker.org> Cc: current@freebsd.org Subject: Re: Segmentation fault running ntpd Message-ID: <1437323051.1334.383.camel@freebsd.org> In-Reply-To: <20150718120956.GC1155@albert.catwhisker.org> References: <20150718120956.GC1155@albert.catwhisker.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2015-07-18 at 05:09 -0700, David Wolfskill wrote: > Lousy timing (no pun intended -- it's early in the day for me), > given the recent MFC, but as I was booting my laptop to yesterday's > head: > > FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127 r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 root@g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY amd64 > > to build today's head (@r285670; still in progress as I type), I > happened to note [Oh, great -- we can no longer copy/paste from > console now??!? Fine, I'll transcribe by hand.... :-(]: > > ... > bound to 172.17.1.245 -- renewal in 43200 seconds. > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) > Starting Network: lo0 em0 iwn0 lagg0. > ... > > Trying to examine the /ntpd.core, I see: > root@g1-245:/ # gdb `which ntpd` ntpd.core > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... > Core was generated by `ntpd'. > Program terminated with signal 11, Segmentation fault. > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. > Loaded symbols for /lib/libm.so.5 > Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done. > Loaded symbols for /lib/libcrypto.so.7 > Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. > Loaded symbols for /lib/libthr.so.3 > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. > Loaded symbols for /lib/libc.so.7 > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. > Loaded symbols for /libexec/ld-elf.so.1 > #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 > [New Thread 801c07400 (LWP 100122/<unknown>)] > [New Thread 801c06400 (LWP 100120/<unknown>)] > (gdb) bt > #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 > #1 0x00000008ccbd4f34 in ?? () > #2 0x0000000000000005 in ?? () > #3 0x0000000801800448 in ?? () > #4 0x00000008011ca888 in sbrk () from /lib/libc.so.7 > #5 0x00000008018000c8 in ?? () > #6 0x00000008018000c0 in ?? () > #7 0x0000000000000208 in ?? () > #8 0x0000000801c32fb0 in ?? () > #9 0x0000000000000001 in ?? () > #10 0x0000000801cc20c8 in ?? () > #11 0x0000000000000030 in ?? () > #12 0x0000000801cc20c8 in ?? () > #13 0x00007fffffffe480 in ?? () > #14 0x00000008011cd240 in sbrk () from /lib/libc.so.7 > #15 0x0000000000000280 in ?? () > #16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7 > #17 0x00000008018000c0 in ?? () > #18 0x0000000801800448 in ?? () > #19 0x0000000000000032 in ?? () > #20 0x0000000801800458 in ?? () > #21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7 > #22 0x0000000801cc2000 in ?? () > ---Type <return> to continue, or q <return> to quit--- > #23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7 > #24 0x0000000801cc20d8 in ?? () > #25 0x00000000000000a0 in ?? () > #26 0x0000000000000208 in ?? () > #27 0x00007fffffffe4d0 in ?? () > #28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7 > Previous frame inner to this frame (corrupt stack?) > (gdb) > > which seems... well, not especially useful, as far as I can tell. > > > This is (as mentioned above) on my laptop; as such, it is expected to > "wander" from one network to another. Accordingly: > > * Since it could be connected to a network I do not control, I use a > packet filter (IPFW, in my case) to reduce my exposure from a > possibly-hostile network. > > * Rather than enabling ntpd in /etc/rc.conf, I use > /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP > lease. (For networks I control, I also set up the DHCP server to > advertise what NTP server the DHCP clients should use, but the code in > dhclient-exit-hooks merely prefers that, rather han requiring it.) > > * In my world-view -- at least for networks I control -- DNS zone files > are the Source of Truth with respect to hostname <-> IP address > correspondence, and Dynamic DNS is Evil. I populate my zone files > with appropriate A & PTR records so that every assignable DHCP > address has a PTR record, and the hostname to which it points has > an A record that points back to that IP address. Accordingly, I > also use /etc/dhclient-exit-hooks so the laptop can find out what > its hostname is, and set it accordingly. > > Mind, I've been doing the above for well over a decade, so that doesn't > qualify as "new." > > And most of the time, it Just Works (which is a significant reason I > keep doing it). > > A couple of other things that are more recent, and possibly of > relevance: > > * As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using > Link Aggregation in "failover" mode. In practice, I rarely use > the em0 (wired) NIC -- I had originally done that based on a > misperception of how I thought things were set up at work, and > then just left the configuration alone and relied on the wireless > NIC. (At home, I have things set up so that the failover would > work, but doing so would be a little awkward for reasons that > aren't relevant here.) > > * I have the laptop configured to run xdm(1)... after the DHCP lease is > acquired and the hostname is set. My ~/.xsession script is set > up so it fires up ssh-agent, requests a passphrase, and then > (among other things) establishes an SSH session to the "mail hub" > at home and re-establish a tmux session where I'm running mutt > to handle my email. I've noticed that in head, these connections > sometimes fail to get initialized, and sometimes will time out, > while sessions started a few minutes later will have no problem. > That seems peculiar, but was sufficiently ... well, "nebulous" that > I didn't think it warranted a whine of its own here. But on the > chance that it's related to ntpd giving up the ghost prematurely, > it seemed but a reasonable exercise of "Full Disclosure" to mention > it in this context -- even though it's also something I've been doing > since the (late) 1990s. > > So: Any suggestions for either diagnosing what the root cause is or > changing the configuration so that the failure no longer occurs? > > Thanks! > > Peace, > david Was there anything (at all) in /var/log/messages about ntpd? Even the routine messages (such as what interfaces it binds to) might give a bit of a clue about how far it got in its init before it died. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1437323051.1334.383.camel>