From owner-freebsd-current@FreeBSD.ORG  Thu Feb  5 07:48:38 2015
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 11FFF9FA
 for <freebsd-current@freebsd.org>; Thu,  5 Feb 2015 07:48:38 +0000 (UTC)
Received: from mail-la0-x234.google.com (mail-la0-x234.google.com
 [IPv6:2a00:1450:4010:c03::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6FF7FF6A
 for <freebsd-current@freebsd.org>; Thu,  5 Feb 2015 07:48:37 +0000 (UTC)
Received: by mail-la0-f52.google.com with SMTP id gd6so4540962lab.11
 for <freebsd-current@freebsd.org>; Wed, 04 Feb 2015 23:48:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=ZXPTc5CrPKh+JvGSSLwOW/XZBZqMrkV5F5Hlub8CPko=;
 b=uviYlPXRRsCBH9eTiJcnj/Yu8/SFHCvVtB2QHxaVwWSsh8Tlz0/Gcjh1G7q4mfb+WS
 17UU/ClhCvGikRUIV0jePxFGL1mYc3wKJLiZ3CJjvtdbG9qaHFUKx7phYn7ivyBKxLxS
 HWB+GwieE657jZpvdsYCbC1DG90WElrCR6a6FjHoGDTi7SapBNRjjUA8ChbNccUQ9lgS
 RqyHG7dXhUed6ck0ic5xt+pPvKpQ08+wFb18bGVrARItqD5Zl5yAFISYuv52sEz1Gd2t
 eKokB/tK4Tmys6GxJ5S/LGOLlJCYKdwDAVnQR1T8CuKybyycm5qy3/0drCigvhlaa1h7
 9a0A==
MIME-Version: 1.0
X-Received: by 10.112.55.199 with SMTP id u7mr1948836lbp.74.1423122513837;
 Wed, 04 Feb 2015 23:48:33 -0800 (PST)
Sender: rizzo.unipi@gmail.com
Received: by 10.114.19.206 with HTTP; Wed, 4 Feb 2015 23:48:33 -0800 (PST)
In-Reply-To: <2509923.ondFvsFdql@overcee.wemm.org>
References: <8089702.oYScRm8BTN@overcee.wemm.org>
 <20150204142941.GE42409@kib.kiev.ua>
 <2509923.ondFvsFdql@overcee.wemm.org>
Date: Thu, 5 Feb 2015 08:48:33 +0100
X-Google-Sender-Auth: co1yx-97K2UJ5OTeWkq1wHgFWBo
Message-ID: <CA+hQ2+iVE53PJs0noc_SPHpwDZVLX-tHpgYmzO9tGzJzDXwXWg@mail.gmail.com>
Subject: Re: PSA: If you run -current, beware!
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Peter Wemm <peter@wemm.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: Konstantin Belousov <kostikbel@gmail.com>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Feb 2015 07:48:38 -0000

On Thursday, February 5, 2015, Peter Wemm <peter@wemm.org> wrote:

> On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote:
> > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote:
> > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has
> been
> > > introduced to 11.x/head/-current.    With HZ=1000 (the default for bare
> > > metal, not for a vm); the clocks stop just after 24 days of uptime.
> This
> > > means things like cron, sleep, timeouts etc stop working.  TCP/IP won't
> > > time out or retransmit, etc etc.  It can get ugly.
> > >
> > > The problem is NOT in 10.x/-stable.
> > >
> > > We hit this in the freebsd.org cluster, the builds that we used are:
> > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine
> > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan  7 18:47:09 UTC 2015 - broken
> > >
> > > If you are running -current in a situation where it'll accumulate
> uptime,
> > > you may want to take precautions.  A reboot prior to 24 days uptime (as
> > > horrible a workaround as that is) will avoid it.
> > >
> > > Yes, this is being worked on.
> >
> > So the issue is reproducable in 3 minutes after boot with the following
> > change in kern_clock.c:
> > volatile int  ticks = INT_MAX - (/*hz*/1000 * 3 * 60);
> >
> > It is fixed (in the proper meaning of the word, not like worked around,
> > covered by paper) by the patch at the end of the mail.
> >
> > We already have a story trying to enable much less ambitious option
> > -fno-strict-overflow, see r259045 and the revert in r259422.  I do not
> > see other way than try one more time.  Too many places in kernel
> > depend on the correctly wrapping 2-complement arithmetic, among others
> > are callweel and scheduler.
>
>
Rather than depending on a compiler option, wouldn't it be better/more
robust to change ticks to unsigned, which has specified wrapping behavior?

Cheers
Luigi

Ugh.
>
> I believe I have a smoking gun that suggests that the clock-stop problem is
> caused by the clang-3.5 import on Dec 31st.
>
> Backstory:
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
> http://www.airs.com/blog/archives/120
>
> I suspect that what has happened is that clang's optimizer got better at
> seeing the direct or indirect effects of integer overflow and clang (and
> gcc)
> take advantage of that.
>
> I have used a slightly different change for about 10 years:
>
> --- kern/kern_clock.c   2014-12-01 15:42:21.707911656 -0800
> +++ kern/kern_clock.c   2014-12-01 15:42:21.707911656 -0800
> @@ -410,6 +415,11 @@
>  #ifdef SW_WATCHDOG
>         EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0);
>  #endif
> +       /*
> +        * Arrange for ticks to go negative just 5 minutes after boot
> +        * to help catch sign problems sooner.
> +        */
> +       ticks = INT_MAX - (hz * 5 * 60);
>  }
>
>  /*
>
> This came about from when we had problems with integer overflow arithmetic
> in
> the tcp stack.
>
> In any case, I'm in the process of adding -fwrapv and the early wraparound
> to
> the freebsd.org cluster builds to give it some wider exercise.
>
> --
> Peter Wemm - peter@wemm.org <javascript:;>; peter@FreeBSD.org;
> peter@yahoo-inc.com <javascript:;>; KI6FJV
> UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246


-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------