From owner-svn-src-all@FreeBSD.ORG Mon Jun 17 21:20:54 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 193DF953; Mon, 17 Jun 2013 21:20:54 +0000 (UTC) (envelope-from edschouten@gmail.com) Received: from mail-ve0-x231.google.com (mail-ve0-x231.google.com [IPv6:2607:f8b0:400c:c01::231]) by mx1.freebsd.org (Postfix) with ESMTP id 9BE651085; Mon, 17 Jun 2013 21:20:53 +0000 (UTC) Received: by mail-ve0-f177.google.com with SMTP id cz10so2539146veb.22 for ; Mon, 17 Jun 2013 14:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=t1Gi8OAeicqGBkHH8LReu/iTXu9Nmg2NBz8F53TljVs=; b=rhNhnZUJMD3+WQJ2wluPdCw0m6BEFvZAUQxXYzzv6e7SPWs5GTWpulWvVm8qygczgq M/ida5PvMBzTDU5c319VzL42Bo3hj2sQdsOmD1WfYhMhGoSIQ4YWkkFDMwoROD1pFRso dUEQLff+j5w4L0VOuSQKjjJsLK711fFfYKNO+zBDlnWyTmfqtHdsvW3RXkZal3eZVghn /aZTVa43F8INSB1vDWxLFdtzBI86xlts/4P9tpFGH2+GqAroaeBB5+KofsmsQdjeq8f2 M297FTXicmPngEAl7pF/tyP+3OuAIV+JNq+O8oYUk/OSGfi+kUm4WbOe5z6uMxv7FTNf BYPw== MIME-Version: 1.0 X-Received: by 10.58.215.200 with SMTP id ok8mr5118147vec.21.1371504053103; Mon, 17 Jun 2013 14:20:53 -0700 (PDT) Sender: edschouten@gmail.com Received: by 10.220.107.139 with HTTP; Mon, 17 Jun 2013 14:20:53 -0700 (PDT) In-Reply-To: <51BDCEE0.8050000@freebsd.org> References: <201306160930.r5G9UZfE059294@svn.freebsd.org> <51BDCEE0.8050000@freebsd.org> Date: Mon, 17 Jun 2013 23:20:53 +0200 X-Google-Sender-Auth: 1liihxLqx5J__NZ9p4uATvgbq9g Message-ID: Subject: Re: svn commit: r251803 - head/sys/kern From: Ed Schouten To: Nathan Whitehorn Content-Type: text/plain; charset=UTF-8 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jun 2013 21:20:54 -0000 Hi Nathan, 2013/6/16 Nathan Whitehorn : > I'm a little worried about these kinds of changes from a performance > standpoint when using GCC 4.2. In particular, from the GCC manual: "In most > cases, these builtins are considered a full barrier." This is much more > synchronization that many of the atomic ops in machine/atomic.h actually > require. I'm worried this could lead to serious performance regressions on > e.g. PowerPC. gcc mostly seems to do the right thing, but I'm not completely > sure and it probably needs extensive testing. One way to accomplish that > could be to implement atomic(9) in terms of stdatomic. If nothing breaks or > becomes slow, then we will know we are in the clear permanently. > -Nathan Agreed. I did indeed implement on top of as a test a couple of weeks ago. What is nice, is that if I look at amd64/i386, the emitted machine code is almost identical, with the exception that in certain cases, generates more compact instructions (e.g. "lock inc" instead of adding an immediate 1). On armv6 the trend is similar, with the exception that in some cases Clang manages to emit slightly more intelligent code. It seems that one of our pieces of inline assembly causes the compiler to zero out certain registers before inserting the inline assembly, even though these registers tend to be overwritten by the assembly anyway. Weird. Replacement of used on amd64: http://80386.nl/pub/machine-atomic-wrapped.txt Still, you were actually interested in knowing the difference in performance when using GCC 4.2. I have to confess, I don't have any numbers on this, but I suspect there will be a dip, of course. But let me be clear on this; I am not proposing that we migrate our existing codebase to C11 atomics within the nearby future. This is something that should be considered by the time most of the platforms use Clang (or, unlikely GCC 4.6+). The reason why I made this chance, was that I at least want to have some coverage of the C11 atomics both in kernelspace and userspace. My goal is that C11 atomics work correctly on FreeBSD 10.0. My fear is that this likely cannot be achieved if there are exactly 0 pieces of code in our tree that use this. By not doing so, breakage of could go by unnoticed, maybe already when someone makes a tiny "harmless" modification to or . Correct me if I'm wrong, but I think it's extremely unlikely that this specific change will noticeably regress performance of the system as a whole. If I wanted to cripple performance on these architectures, I would have changed mtx(9) to use C11 atomics instead. Unrelated to this, there is something about this specific piece of code that is actually very interesting if you look at it into more detail. Notice how I took the liberty of changing filt_timerattach() to use a compare-and-exchange, instead of the two successive atomic operations it used to do. Maybe a smart compiler could consider rewriting this piece of code to something along the lines of this (on armv6): ldr r0, [kq_calloutmax] ldrex r1, [kq_ncallouts] cmp r0, r1 blt ... add r2, r1, #1 strex r1, r2, [kq_ncallouts] In other words, convert this to a "compare-less-than-and-increment", which is not offered by . It'll be interesting to see whether Clang will reach such a level of code quality. -- Ed Schouten