From owner-freebsd-stable@freebsd.org Thu Mar 5 21:31:37 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1E6FF24D25C for ; Thu, 5 Mar 2020 21:31:37 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound3d.ore.mailhop.org (outbound3d.ore.mailhop.org [54.186.57.195]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48YP88220lz42MY for ; Thu, 5 Mar 2020 21:31:36 +0000 (UTC) (envelope-from ian@freebsd.org) ARC-Seal: i=1; a=rsa-sha256; t=1583443894; cv=none; d=outbound.mailhop.org; s=arc-outbound20181012; b=HnHbLehXV3Pyoz5e0ptP1Wcoqplh6i40rXZDXFZl3QD6nd7hVtHg8wBu5NgX8YEbIOM5E3ylYJnsW 2EExngXIQXla94P4mv/jWoSmjHSnyiUH7DnCw6Zc8zagZPhKceK9htjoFpbkZ1cD7D6NTfvjPXGyN5 vbak3dpOqoJ0qM552PiR9XUCqQd86TZ8+cY5kOvnemwzquCD2WPMLXna+CSrTx+WwRHttIr0HUTFBq YABkJqSBs5j8jaTVYJTyg1OLLrrlutKA1Rnqpq8XzNTzTIKhazelMhL1xM6FsrxyH/+i+CTtxkHY7S YtAUY5P5DVxz/RLzsiYcciztMWNmOsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=arc-outbound20181012; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:dkim-signature:from; bh=4ZBgmbJwneAivXur90B8sgicJ7ZoC8SryhBuVLUzSRs=; b=nJCTQECJZ66pHSD6UiN23QQ1YMZoEIEAK36IFdSwtO5wGY0SoaRKIy4HIpTVFciwfH+50PhLcG20W misQmBzMII9bSK+yz9m4iUhtgDn8m3i53Zs/Weu0gO4O4Uv9ukYjBixGaQle/JcrrxKPST5bs/Rtoi mt0JDsYPumqWNrV9MP4xxegirzvvWP2eC/rfsDaBJqU0OY47e6/xEbkFyxisgl4jNeWj4u4E63fDMK T7F1YYhcTsE+gXfq1zsoDywRVIjQxoPdpLWL6H+1JqgnZEGZPnR4kBCzP+6HJlWxeJB2OG/xYshyqT gcCCeS/QIMSxsVixsnLea5H+Z3SalYg== ARC-Authentication-Results: i=1; outbound3.ore.mailhop.org; spf=softfail smtp.mailfrom=freebsd.org smtp.remote-ip=67.177.211.60; dmarc=none header.from=freebsd.org; arc=none header.oldest-pass=0; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=dkim-high; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:from; bh=4ZBgmbJwneAivXur90B8sgicJ7ZoC8SryhBuVLUzSRs=; b=Ed8kpTp4PzwqA9mHkCkrqMEIVADpZKW9t9ab+Po0VQQVK67oIAvM/JHZKPoWIeY5URVU9K3kmawq1 9/93d7CAwH/O4gDCTqxRK0U6WIxxpvPiWMpFjvCNUyU2wDBuBVLSJa2FKNLKIQwpYE8KTpdq2Ag93I L9uTe3XBdznk7t0Onj9rPCKQRGhMf3+fVOWJNkJuOIrhi/711AXpGUrNBRQX0fYRSUcyWeCaN+GjWu clDO8XhuPNx4cwVuPweo7ov02lRBj4ORqE68bznpdpb7R5/KL9bbO7lsUF/xhhDBchHdzccrPMyywi OUnUffG/hja8Qm+8LUryK1KB03p7aRg== X-MHO-RoutePath: aGlwcGll X-MHO-User: aee44f6b-5f28-11ea-b80e-052b4a66b6b2 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 67.177.211.60 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [67.177.211.60]) by outbound3.ore.mailhop.org (Halon) with ESMTPSA id aee44f6b-5f28-11ea-b80e-052b4a66b6b2; Thu, 05 Mar 2020 21:31:33 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id 025LVW9A025493; Thu, 5 Mar 2020 14:31:32 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <21f48a0c5f14c93b72b78c0d9d0ecfb4956fb062.camel@freebsd.org> Subject: Re: ntp problems stratum 2 to 14? From: Ian Lepore To: Dewayne Geraghty Cc: freebsd-stable stable Date: Thu, 05 Mar 2020 14:31:32 -0700 In-Reply-To: References: Content-Type: text/plain; charset="ASCII" X-Mailer: Evolution 3.28.5 FreeBSD GNOME Team Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48YP88220lz42MY X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-0.23 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_SPAM_MEDIUM(0.77)[0.767,0]; ASN(0.00)[asn:16509, ipnet:54.186.0.0/15, country:US]; NEURAL_HAM_LONG(-1.00)[-0.998,0] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Mar 2020 21:31:37 -0000 On Fri, 2020-03-06 at 07:49 +1100, Dewayne Geraghty wrote: > Ian, Good points. I did remove the fudge and 127.127.1.1 lines from the > config with the same result as below. Interestingly the clock at 10.0.7.6 > isn't really unreliable, as its been my time source since 2005, and serves > clients, so it is pretty ok. Without a local clock, named fails (its linked > with kerberos). Yes I'd never seen my clock server become st 14, which > prompted me to seek help. I haven't repeated that scenario, but I > continuously get "interrupted system call" and I haven't been able to sync > while running debug or ktrace - so I dont have "what good looks like". > > Thanks Peter. I've rebuilt net/ntpd in various ways including all > defaults. Rebuilt the kernel (& world), to the latest 12 stable. > Reset almost all sysctl's (I change 91 of them). I keep getting > interrupted system call at 1 sec intervals, which I suspect is a problem > > For the reader: a stratum 2 clocks 10.0.7.6 syncs with some atomic clocks > within city; a server 10.0.7.91 running ntpd 4.2.8p14 on FreeBSD 12.1 > r358565M irregularly and usually wont sync, and experiences "interrupted > system calls". > > The jump to stratum 14 was a surprise, but not repeatable. Sometimes the > ntpd port starts and uses the next hop time server, but usually within 20 > minutes returns to LOCL though more often goes straight to LOCL. During > ktrace's I've observed: > > 66894 ntpd CALL write(0x1,0x80078e000,0x2e) > 66894 ntpd GIO fd 1 wrote 46 bytes > "select() returned -1: Interrupted system call > ... > "poll_update: at 1 10.0.7.6 poll 4 burst 0 retry 2 head 14 early 2 > next 16 > " > 66894 ntpd RET write 74/0x4a > 66894 ntpd CALL select(0x19,0x7fffffffde50,0,0,0) > 66894 ntpd RET select -1 errno 4 Interrupted system call > 66894 ntpd PSIG SIGALRM caught handler=0x80072f600 mask=0x0 > code=SI_KERNEL > 66894 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0) > 66894 ntpd RET sigprocmask 0 > 66894 ntpd CALL sigreturn(0x7fffffffd3d0) > 66894 ntpd RET sigreturn JUSTRETURN > 66894 ntpd CALL write(0x1,0x80078e000,0x2e) > 66894 ntpd GIO fd 1 wrote 46 bytes > "select() returned -1: Interrupted system call > ... > "select() returned -1: Interrupted system call > " > 66894 ntpd RET write 46/0x2e > 66894 ntpd CALL select(0x19,0x7fffffffde50,0,0,0) > 66894 ntpd RET select -1 errno 4 Interrupted system call > 66894 ntpd PSIG SIGALRM caught handler=0x80072f600 mask=0x0 > code=SI_KERNEL > 66894 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0) > 66894 ntpd RET sigprocmask 0 > 66894 ntpd CALL sigreturn(0x7fffffffd3d0) > 66894 ntpd RET sigreturn JUSTRETURN > 66894 ntpd CALL write(0x1,0x80078e000,0x2e) > 66894 ntpd GIO fd 1 wrote 46 bytes > "select() returned -1: Interrupted system call > but I have no idea whether or not these interrupted system calls are normal > or not. > > and with debug (-D5) this is what I see > > 5 Mar 18:26:50 ntpd[86274]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 5 Mar 18:26:51 ntpd[86274]: select(): nfound=-1, error: Interrupted system > call > poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 0 early 2 next 16 > sendpkt(21, dst=10.0.7.6, src=10.0.7.91, ttl=0, len=72) > peer_xmit: at 17 10.0.7.91->10.0.7.6 mode 3 keyid 0000232f len 72 index 0 > read_network_packet: fd=21 length 72 from 10.0.7.6 > fetch_timestamp: system bintime network time stamp: 1583393211.408612992 > restrictions: looking up 10.0.7.6 > match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match: > ippeerlimit -4 > match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match: > ippeerlimit -1 > match_restrict4_addr: Checking 10.0.7.91, port 123 ... doesn't match: > ippeerlimit -4 > match_restrict4_addr: Checking 10.0.7.6, port 123 ... MATCH: ippeerlimit -1 > receive: at 17 10.0.7.91<-10.0.7.6 ippeerlimit -1 mode 4 iflags > up,broadcast,multicast restrict nomodify,nopeer,noquery,notrap org > 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b > MRU: interval 16 headway 8 limit 64 > receive: at 17 10.0.7.91<-10.0.7.6 mode 4/server:AM_PROCPKT keyid 0000232f > len 72 auth 1 org 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b MAC > receive: MATCH_ASSOC dispatch: mode 4/server:AM_PROCPKT > poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 14 early 2 next 16 > clock_filter: n 2 off -3.283397 del 0.000382 dsp 3.937561 jit 0.000551 > alarming: normal > 5 Mar 18:26:52 ntpd[86274]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 5 Mar 18:26:53 ntpd[86274]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > > I have rebuild with ntp-4.2.8p14 and with no additional compile rules and > the port's defaults. I keep getting > > 6 Mar 07:05:16 ntpd[98682]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 6 Mar 07:05:17 ntpd[98682]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 6 Mar 07:05:18 ntpd[98682]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 6 Mar 07:05:19 ntpd[98682]: select(): nfound=-1, error: Interrupted system > call > alarming: normal > 6 Mar 07:05:20 ntpd[98682]: select(): nfound=-1, error: Interrupted system > call > > So with compile, minimum settings and all aslr variables set to 0. We get > the above result running as either ntpd or root. I'm starting to wonder > about the network stack? > > My build comprises: > # make __MAKE_CONF=/etc/make_test.conf -C /usr/ports/net/ntp -VCFLAGS > -VLDFLAGS > -O2 -pipe -fstack-protector-strong -fno-strict-aliasing > -fstack-protector > # make __MAKE_CONF=/etc/make_test.conf -C /usr/ports/net/ntp showconfig > > grep =on > > DEBUG=on: Build with debugging support > DOCS=on: Build and/or install documentation > EXAMPLES=on: Build and/or install examples > IPV6=on: IPv6 protocol support > LOCAL_CLOCK=on: Enable local clock reference > NLS=on: Native Language Support > PERL_UTILS=on: Install ntp-wait and ntptrace utilities > SSL=on: SSL protocol support > THREADS=on: Threading support > > Also the base systems ntp 4.2.8p13 also experiences "interrupted system > calls" > 80370 ntpd CALL select(0x19,0x7fffffffde40,0,0,0) > 80370 ntpd RET select -1 errno 4 Interrupted system call > 80370 ntpd PSIG SIGALRM caught handler=0x801473610 mask=0x0 > code=SI_KERNEL > 80370 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0) > 80370 ntpd RET sigprocmask 0 > 80370 ntpd CALL sigreturn(0x7fffffffd3d0) > 80370 ntpd RET sigreturn JUSTRETURN > 80370 ntpd CALL select(0x19,0x7fffffffde40,0,0,0) > 80370 ntpd RET select -1 errno 4 Interrupted system call > ; perhaps just having aslr built into the kernel?? Its straw clutching > time... > > I appreciate your consideration. All suggestions welcome. :) > PS Ian the main time server 10.0.7.6 sync's off 2x atomic clocks, with 2x > GPS as failover ;) The internal server 10.0.7.91 serves the internal > clients only. > The interrupted system calls (EINTR returned from select()) are normal. Notice that each time it happens, it's associated with a SIGALARM being delivered to ntpd. Ntpd uses SIGALARM at 1hz to periodically get control and decide whether it's time to poll peers and do other periodic work. You say 10.0.7.6 syncs with some atomic clocks, but in your initial posting it was sync'd to its own LOCL clock at stratum 14, which is why the ntpd you were asking about refused to sync to it and also fell back to its own LOCL clock. Eventually 10.0.7.6 sync'd to 203.35.83.242, then the system you were asking about sync'd to 10.0.7.6. -- Ian