From owner-svn-src-all@FreeBSD.ORG  Mon Jun 17 21:20:54 2013
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 193DF953;
 Mon, 17 Jun 2013 21:20:54 +0000 (UTC)
 (envelope-from edschouten@gmail.com)
Received: from mail-ve0-x231.google.com (mail-ve0-x231.google.com
 [IPv6:2607:f8b0:400c:c01::231])
 by mx1.freebsd.org (Postfix) with ESMTP id 9BE651085;
 Mon, 17 Jun 2013 21:20:53 +0000 (UTC)
Received: by mail-ve0-f177.google.com with SMTP id cz10so2539146veb.22
 for <multiple recipients>; Mon, 17 Jun 2013 14:20:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=t1Gi8OAeicqGBkHH8LReu/iTXu9Nmg2NBz8F53TljVs=;
 b=rhNhnZUJMD3+WQJ2wluPdCw0m6BEFvZAUQxXYzzv6e7SPWs5GTWpulWvVm8qygczgq
 M/ida5PvMBzTDU5c319VzL42Bo3hj2sQdsOmD1WfYhMhGoSIQ4YWkkFDMwoROD1pFRso
 dUEQLff+j5w4L0VOuSQKjjJsLK711fFfYKNO+zBDlnWyTmfqtHdsvW3RXkZal3eZVghn
 /aZTVa43F8INSB1vDWxLFdtzBI86xlts/4P9tpFGH2+GqAroaeBB5+KofsmsQdjeq8f2
 M297FTXicmPngEAl7pF/tyP+3OuAIV+JNq+O8oYUk/OSGfi+kUm4WbOe5z6uMxv7FTNf
 BYPw==
MIME-Version: 1.0
X-Received: by 10.58.215.200 with SMTP id ok8mr5118147vec.21.1371504053103;
 Mon, 17 Jun 2013 14:20:53 -0700 (PDT)
Sender: edschouten@gmail.com
Received: by 10.220.107.139 with HTTP; Mon, 17 Jun 2013 14:20:53 -0700 (PDT)
In-Reply-To: <51BDCEE0.8050000@freebsd.org>
References: <201306160930.r5G9UZfE059294@svn.freebsd.org>
 <51BDCEE0.8050000@freebsd.org>
Date: Mon, 17 Jun 2013 23:20:53 +0200
X-Google-Sender-Auth: 1liihxLqx5J__NZ9p4uATvgbq9g
Message-ID: <CAJOYFBD6JE1+n6uX+SfHU-WdCnAYDMW6gB_+DQ_DRVNfAQS11A@mail.gmail.com>
Subject: Re: svn commit: r251803 - head/sys/kern
From: Ed Schouten <ed@80386.nl>
To: Nathan Whitehorn <nwhitehorn@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jun 2013 21:20:54 -0000

Hi Nathan,

2013/6/16 Nathan Whitehorn <nwhitehorn@freebsd.org>:
> I'm a little worried about these kinds of changes from a performance
> standpoint when using GCC 4.2. In particular, from the GCC manual: "In most
> cases, these builtins are considered a full barrier." This is much more
> synchronization that many of the atomic ops in machine/atomic.h actually
> require. I'm worried this could lead to serious performance regressions on
> e.g. PowerPC. gcc mostly seems to do the right thing, but I'm not completely
> sure and it probably needs extensive testing. One way to accomplish that
> could be to implement atomic(9) in terms of stdatomic. If nothing breaks or
> becomes slow, then we will know we are in the clear permanently.
> -Nathan

Agreed. I did indeed implement <machine/atomic.h> on top of
<stdatomic.h> as a test a couple of weeks ago. What is nice, is that
if I look at amd64/i386, the emitted machine code is almost identical,
with the exception that in certain cases, <stdatomic.h> generates more
compact instructions (e.g. "lock inc" instead of adding an immediate
1).

On armv6 the trend is similar, with the exception that in some cases
Clang manages to emit slightly more intelligent code. It seems that
one of our pieces of inline assembly causes the compiler to zero out
certain registers before inserting the inline assembly, even though
these registers tend to be overwritten by the assembly anyway. Weird.

Replacement of <machine/atomic.h> used on amd64:

http://80386.nl/pub/machine-atomic-wrapped.txt

Still, you were actually interested in knowing the difference in
performance when using GCC 4.2. I have to confess, I don't have any
numbers on this, but I suspect there will be a dip, of course. But let
me be clear on this; I am not proposing that we migrate our existing
codebase to C11 atomics within the nearby future. This is something
that should be considered by the time most of the platforms use Clang
(or, unlikely GCC 4.6+).

The reason why I made this chance, was that I at least want to have
some coverage of the C11 atomics both in kernelspace and userspace. My
goal is that C11 atomics work correctly on FreeBSD 10.0. My fear is
that this likely cannot be achieved if there are exactly 0 pieces of
code in our tree that use this. By not doing so, breakage of
<stdatomic.h> could go by unnoticed, maybe already when someone makes
a tiny "harmless" modification to <sys/cdefs.h> or <sys/_types.h>.

Correct me if I'm wrong, but I think it's extremely unlikely that this
specific change will noticeably regress performance of the system as a
whole. If I wanted to cripple performance on these architectures, I
would have changed mtx(9) to use C11 atomics instead.

Unrelated to this, there is something about this specific piece of
code that is actually very interesting if you look at it into more
detail. Notice how I took the liberty of changing filt_timerattach()
to use a compare-and-exchange, instead of the two successive atomic
operations it used to do. Maybe a smart compiler could consider
rewriting this piece of code to something along the lines of this (on
armv6):

ldr r0, [kq_calloutmax]
ldrex r1, [kq_ncallouts]
cmp r0, r1
blt ...
add r2, r1, #1
strex r1, r2, [kq_ncallouts]

In other words, convert this to a "compare-less-than-and-increment",
which is not offered by <machine/atomic.h>. It'll be interesting to
see whether Clang will reach such a level of code quality.

--
Ed Schouten <ed@80386.nl>