Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Feb 2015 15:15:57 -0800
From:      Peter Wemm <peter@wemm.org>
To:        freebsd-current@freebsd.org
Cc:        Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: PSA: If you run -current, beware!
Message-ID:  <2509923.ondFvsFdql@overcee.wemm.org>
In-Reply-To: <20150204142941.GE42409@kib.kiev.ua>
References:  <8089702.oYScRm8BTN@overcee.wemm.org> <20150204142941.GE42409@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--nextPart2822483.AiuhAghUd7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="us-ascii"

On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote:
> On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote:
> > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has=
 been
> > introduced to 11.x/head/-current.    With HZ=3D1000 (the default fo=
r bare
> > metal, not for a vm); the clocks stop just after 24 days of uptime.=
  This
> > means things like cron, sleep, timeouts etc stop working.  TCP/IP w=
on't
> > time out or retransmit, etc etc.  It can get ugly.
> >=20
> > The problem is NOT in 10.x/-stable.
> >=20
> > We hit this in the freebsd.org cluster, the builds that we used are=
:
> > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fin=
e
> > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan  7 18:47:09 UTC 2015 - bro=
ken
> >=20
> > If you are running -current in a situation where it'll accumulate u=
ptime,
> > you may want to take precautions.  A reboot prior to 24 days uptime=
 (as
> > horrible a workaround as that is) will avoid it.
> >=20
> > Yes, this is being worked on.
>=20
> So the issue is reproducable in 3 minutes after boot with the followi=
ng
> change in kern_clock.c:
> volatile int=09ticks =3D INT_MAX - (/*hz*/1000 * 3 * 60);
>=20
> It is fixed (in the proper meaning of the word, not like worked aroun=
d,
> covered by paper) by the patch at the end of the mail.
>=20
> We already have a story trying to enable much less ambitious option
> -fno-strict-overflow, see r259045 and the revert in r259422.  I do no=
t
> see other way than try one more time.  Too many places in kernel
> depend on the correctly wrapping 2-complement arithmetic, among other=
s
> are callweel and scheduler.

Ugh.

I believe I have a smoking gun that suggests that the clock-stop proble=
m is=20
caused by the clang-3.5 import on Dec 31st.

Backstory:
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
http://www.airs.com/blog/archives/120

I suspect that what has happened is that clang's optimizer got better a=
t=20
seeing the direct or indirect effects of integer overflow and clang (an=
d gcc)=20
take advantage of that.

I have used a slightly different change for about 10 years:

=2D-- kern/kern_clock.c=092014-12-01 15:42:21.707911656 -0800
+++ kern/kern_clock.c=092014-12-01 15:42:21.707911656 -0800
@@ -410,6 +415,11 @@
 #ifdef SW_WATCHDOG
 =09EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0);
 #endif
+=09/*
+=09 * Arrange for ticks to go negative just 5 minutes after boot
+=09 * to help catch sign problems sooner.
+=09 */
+=09ticks =3D INT_MAX - (hz * 5 * 60);
 }
=20
 /*

This came about from when we had problems with integer overflow arithme=
tic in=20
the tcp stack.

In any case, I'm in the process of adding -fwrapv and the early wraparo=
und to=20
the freebsd.org cluster builds to give it some wider exercise.

=2D-=20
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI=
6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246
--nextPart2822483.AiuhAghUd7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part.
Content-Transfer-Encoding: 7Bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAABCAAGBQJU0qgtAAoJEDXWlwnsgJ4ECaoH/2oGq9kp+gdyF3xCjtluy3Po
y172XTnGQNIv2Z5/gVDU6i9hgFQxVHnYlUolpB1cs/B7YV/lfjUKYts1FBZrpd7c
y4THM7QdUdDccSZoHTWFWQVi7cdJW8IUR6cQwke/lpwX9fcudknwBE56iYYlIDSB
6/DaAAfC1mWHagXDmaTOIBhPT6JVBCoK9SeCITNIW9unyFMAqNGqRDr0KTeFRzo7
M3aKIIzwWKpgIIIbwwu56t0VwBNfqbEjM27Yjfm1wvJTc0FF2njpm+1JnP4ivD7Q
f7jFfOPPtBzC1Snge8CVnb4TdcamqAAYLPlUAjpg8e5Ey60ad+1UMom1YXPtGhY=
=vo+W
-----END PGP SIGNATURE-----

--nextPart2822483.AiuhAghUd7--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2509923.ondFvsFdql>