Date: Sat, 18 Jul 2015 05:09:56 -0700 From: David Wolfskill <david@catwhisker.org> To: current@freebsd.org Subject: Segmentation fault running ntpd Message-ID: <20150718120956.GC1155@albert.catwhisker.org>
next in thread | raw e-mail | index | archive | help
--F8feX0NACk7Ps8wc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Lousy timing (no pun intended -- it's early in the day for me), given the recent MFC, but as I was booting my laptop to yesterday's head: FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127 r2856= 52M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 root@g1-245.catwhisker= =2Eorg:/common/S3/obj/usr/src/sys/CANARY amd64 to build today's head (@r285670; still in progress as I type), I happened to note [Oh, great -- we can no longer copy/paste from console now??!? Fine, I'll transcribe by hand.... :-(]: =2E.. bound to 172.17.1.245 -- renewal in 43200 seconds. pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) Starting Network: lo0 em0 iwn0 lagg0. =2E.. Trying to examine the /ntpd.core, I see: root@g1-245:/ # gdb `which ntpd` ntpd.core=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain condition= s. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols f= ound)... Core was generated by `ntpd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...d= one. Loaded symbols for /lib/libcrypto.so.7 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...= done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 [New Thread 801c07400 (LWP 100122/<unknown>)] [New Thread 801c06400 (LWP 100120/<unknown>)] (gdb) bt #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 #1 0x00000008ccbd4f34 in ?? () #2 0x0000000000000005 in ?? () #3 0x0000000801800448 in ?? () #4 0x00000008011ca888 in sbrk () from /lib/libc.so.7 #5 0x00000008018000c8 in ?? () #6 0x00000008018000c0 in ?? () #7 0x0000000000000208 in ?? () #8 0x0000000801c32fb0 in ?? () #9 0x0000000000000001 in ?? () #10 0x0000000801cc20c8 in ?? () #11 0x0000000000000030 in ?? () #12 0x0000000801cc20c8 in ?? () #13 0x00007fffffffe480 in ?? () #14 0x00000008011cd240 in sbrk () from /lib/libc.so.7 #15 0x0000000000000280 in ?? () #16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7 #17 0x00000008018000c0 in ?? () #18 0x0000000801800448 in ?? () #19 0x0000000000000032 in ?? () #20 0x0000000801800458 in ?? () #21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7 #22 0x0000000801cc2000 in ?? () ---Type <return> to continue, or q <return> to quit--- #23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7 #24 0x0000000801cc20d8 in ?? () #25 0x00000000000000a0 in ?? () #26 0x0000000000000208 in ?? () #27 0x00007fffffffe4d0 in ?? () #28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7 Previous frame inner to this frame (corrupt stack?) (gdb)=20 which seems... well, not especially useful, as far as I can tell. This is (as mentioned above) on my laptop; as such, it is expected to "wander" from one network to another. Accordingly: * Since it could be connected to a network I do not control, I use a packet filter (IPFW, in my case) to reduce my exposure from a possibly-hostile network. * Rather than enabling ntpd in /etc/rc.conf, I use /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP lease. (For networks I control, I also set up the DHCP server to advertise what NTP server the DHCP clients should use, but the code in dhclient-exit-hooks merely prefers that, rather han requiring it.) * In my world-view -- at least for networks I control -- DNS zone files are the Source of Truth with respect to hostname <-> IP address correspondence, and Dynamic DNS is Evil. I populate my zone files with appropriate A & PTR records so that every assignable DHCP address has a PTR record, and the hostname to which it points has an A record that points back to that IP address. Accordingly, I also use /etc/dhclient-exit-hooks so the laptop can find out what its hostname is, and set it accordingly. Mind, I've been doing the above for well over a decade, so that doesn't qualify as "new." And most of the time, it Just Works (which is a significant reason I keep doing it). A couple of other things that are more recent, and possibly of relevance: * As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using Link Aggregation in "failover" mode. In practice, I rarely use the em0 (wired) NIC -- I had originally done that based on a misperception of how I thought things were set up at work, and then just left the configuration alone and relied on the wireless NIC. (At home, I have things set up so that the failover would work, but doing so would be a little awkward for reasons that aren't relevant here.) * I have the laptop configured to run xdm(1)... after the DHCP lease is acquired and the hostname is set. My ~/.xsession script is set up so it fires up ssh-agent, requests a passphrase, and then (among other things) establishes an SSH session to the "mail hub" at home and re-establish a tmux session where I'm running mutt to handle my email. I've noticed that in head, these connections sometimes fail to get initialized, and sometimes will time out, while sessions started a few minutes later will have no problem. That seems peculiar, but was sufficiently ... well, "nebulous" that I didn't think it warranted a whine of its own here. But on the chance that it's related to ntpd giving up the ghost prematurely, it seemed but a reasonable exercise of "Full Disclosure" to mention it in this context -- even though it's also something I've been doing since the (late) 1990s. So: Any suggestions for either diagnosing what the root cause is or changing the configuration so that the failure no longer occurs? Thanks! Peace, david --=20 David H. Wolfskill david@catwhisker.org Those who murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --F8feX0NACk7Ps8wc Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJVqkIUXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ4RThEMDY4QTIxMjc1MDZFRDIzODYzRTc4 QTY3RjlDOERFRjQxOTNCAAoJEIpn+cje9Bk7aOwP/RBAlxeatsFi61kEoQU7oJRv v9IH8HrdHRBgZIjnOvh6qA9zMMVuAbCSkYVmvwZcFfj55tEiHlsINWRqprSQvE2Q tm0LIN3nkiQ6PGCO1FE6AE8A14EpsNkciJyERkvX9Ue38Yd5WuB+c/vqrR4FvPhq CVvn1oyDhyECo2j7Bf8hX7XZjYAKQasyp2odfslP6xnVvDrIhEPm4hB4QYBu0e3E ImvOn10oNBunyWYUtzPa8MowXpRNBVx7UOlf1dXrXDNteym+6CyJKgPjgUGe7Wwo faensw4gUwAdXOo61Bb93F/L+zxKyR2ojL+PWsQMYO2TOuEpnJ5aQj+wDREKtDUW 0m62JonKtyaDy6UgGEH6mUcscsSqu2l+EYEBy1DZerE1zrLiyS+arhDfG5fbdbSG oQVr44GnVkrXV+4aJhG+wvwLZZlw8QkNx8/DdfD0HaQqtI+iSdIMgJ5bsnWJ+sLA IEn+Hm0jRrWk7RR+li+Z55LLLlfMREZRuMCCtEROh2NsO8rmkgwiY1jOQZX3NVmM mcKuH1VYJ4YMJWbkS8nSUFKl8tgP0pFuB8WA++T6bpqpXGwGOKKgLGHNG/oXAsuL Z/o2M3Z0fhZjFAci0fw7UrF0XYzaeEsOGkGmMQP9FlYTHGkvFYeN6+b6y+SYCy+R ZZhfEk2cEocznBqFyt/D =zZTG -----END PGP SIGNATURE----- --F8feX0NACk7Ps8wc--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150718120956.GC1155>