Date: Wed, 04 Feb 2015 15:15:57 -0800 From: Peter Wemm <peter@wemm.org> To: freebsd-current@freebsd.org Cc: Konstantin Belousov <kostikbel@gmail.com> Subject: Re: PSA: If you run -current, beware! Message-ID: <2509923.ondFvsFdql@overcee.wemm.org> In-Reply-To: <20150204142941.GE42409@kib.kiev.ua> References: <8089702.oYScRm8BTN@overcee.wemm.org> <20150204142941.GE42409@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2822483.AiuhAghUd7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote: > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has= been > > introduced to 11.x/head/-current. With HZ=3D1000 (the default fo= r bare > > metal, not for a vm); the clocks stop just after 24 days of uptime.= This > > means things like cron, sleep, timeouts etc stop working. TCP/IP w= on't > > time out or retransmit, etc etc. It can get ugly. > >=20 > > The problem is NOT in 10.x/-stable. > >=20 > > We hit this in the freebsd.org cluster, the builds that we used are= : > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fin= e > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - bro= ken > >=20 > > If you are running -current in a situation where it'll accumulate u= ptime, > > you may want to take precautions. A reboot prior to 24 days uptime= (as > > horrible a workaround as that is) will avoid it. > >=20 > > Yes, this is being worked on. >=20 > So the issue is reproducable in 3 minutes after boot with the followi= ng > change in kern_clock.c: > volatile int=09ticks =3D INT_MAX - (/*hz*/1000 * 3 * 60); >=20 > It is fixed (in the proper meaning of the word, not like worked aroun= d, > covered by paper) by the patch at the end of the mail. >=20 > We already have a story trying to enable much less ambitious option > -fno-strict-overflow, see r259045 and the revert in r259422. I do no= t > see other way than try one more time. Too many places in kernel > depend on the correctly wrapping 2-complement arithmetic, among other= s > are callweel and scheduler. Ugh. I believe I have a smoking gun that suggests that the clock-stop proble= m is=20 caused by the clang-3.5 import on Dec 31st. Backstory: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html http://www.airs.com/blog/archives/120 I suspect that what has happened is that clang's optimizer got better a= t=20 seeing the direct or indirect effects of integer overflow and clang (an= d gcc)=20 take advantage of that. I have used a slightly different change for about 10 years: =2D-- kern/kern_clock.c=092014-12-01 15:42:21.707911656 -0800 +++ kern/kern_clock.c=092014-12-01 15:42:21.707911656 -0800 @@ -410,6 +415,11 @@ #ifdef SW_WATCHDOG =09EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0); #endif +=09/* +=09 * Arrange for ticks to go negative just 5 minutes after boot +=09 * to help catch sign problems sooner. +=09 */ +=09ticks =3D INT_MAX - (hz * 5 * 60); } =20 /* This came about from when we had problems with integer overflow arithme= tic in=20 the tcp stack. In any case, I'm in the process of adding -fwrapv and the early wraparo= und to=20 the freebsd.org cluster builds to give it some wider exercise. =2D-=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI= 6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 --nextPart2822483.AiuhAghUd7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJU0qgtAAoJEDXWlwnsgJ4ECaoH/2oGq9kp+gdyF3xCjtluy3Po y172XTnGQNIv2Z5/gVDU6i9hgFQxVHnYlUolpB1cs/B7YV/lfjUKYts1FBZrpd7c y4THM7QdUdDccSZoHTWFWQVi7cdJW8IUR6cQwke/lpwX9fcudknwBE56iYYlIDSB 6/DaAAfC1mWHagXDmaTOIBhPT6JVBCoK9SeCITNIW9unyFMAqNGqRDr0KTeFRzo7 M3aKIIzwWKpgIIIbwwu56t0VwBNfqbEjM27Yjfm1wvJTc0FF2njpm+1JnP4ivD7Q f7jFfOPPtBzC1Snge8CVnb4TdcamqAAYLPlUAjpg8e5Ey60ad+1UMom1YXPtGhY= =vo+W -----END PGP SIGNATURE----- --nextPart2822483.AiuhAghUd7--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2509923.ondFvsFdql>