From owner-svn-src-all@freebsd.org  Tue Jan 31 06:25:34 2017
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E3A0CC90A0;
 Tue, 31 Jan 2017 06:25:34 +0000 (UTC)
 (envelope-from cse.cem@gmail.com)
Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id F3922164F;
 Tue, 31 Jan 2017 06:25:33 +0000 (UTC)
 (envelope-from cse.cem@gmail.com)
Received: by mail-wm0-f43.google.com with SMTP id b65so63717249wmf.0;
 Mon, 30 Jan 2017 22:25:33 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:reply-to:in-reply-to:references
 :from:date:message-id:subject:to:cc;
 bh=gnxo8+rR5kwSlYgMLWIjuHqxzaEAdZiU7x1v2/YWlh0=;
 b=TUFBK2GAtMK+QAtiBO9Men2WBdsOMzJUHFdo3K75ona4uBaRm3+IH4JdGpICHo9WQY
 vG9Nfr2r+tif+5eB+m3nUdM23z2JHr7DO6CQP1zhGnX7Dv754qv0DNSwpi5xBL6HJq9K
 CEz0ytKH2GlD4qM6jonTKJ1VAJujy6OBiPBPw3ZQn6DHpIYyTGjHGomDgzok6TfPOBoq
 p786VldKuqNmkW8jhkvZUGkx0ykU9oV9GYSOAYOCHtn+ap04PhBS5jVEFRstBPtEPNPt
 sOgASTyGaAKVJ6Dhe3whpIYmTGYVKuL38pS4c1cATzdNkkx0nieTNKE2WMROgBuHDSYV
 jSHQ==
X-Gm-Message-State: AIkVDXKZIkgpIY3/rrKQyKHnjIGXaoMZBHNRdC4qxZbprkJmOGoVJeXKOse8OZURY76rQg==
X-Received: by 10.223.148.35 with SMTP id 32mr25568575wrq.18.1485843446537;
 Mon, 30 Jan 2017 22:17:26 -0800 (PST)
Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com.
 [74.125.82.48])
 by smtp.gmail.com with ESMTPSA id m188sm20693446wma.0.2017.01.30.22.17.26
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 30 Jan 2017 22:17:26 -0800 (PST)
Received: by mail-wm0-f48.google.com with SMTP id b65so63494606wmf.0;
 Mon, 30 Jan 2017 22:17:26 -0800 (PST)
X-Received: by 10.223.173.183 with SMTP id w52mr26223441wrc.164.1485843446226; 
 Mon, 30 Jan 2017 22:17:26 -0800 (PST)
MIME-Version: 1.0
Reply-To: cem@freebsd.org
Received: by 10.194.22.42 with HTTP; Mon, 30 Jan 2017 22:17:25 -0800 (PST)
In-Reply-To: <20170131153411.G1061@besplex.bde.org>
References: <201701310326.v0V3QW30024375@repo.freebsd.org>
 <20170131153411.G1061@besplex.bde.org>
From: Conrad Meyer <cem@freebsd.org>
Date: Mon, 30 Jan 2017 22:17:25 -0800
X-Gmail-Original-Message-ID: <CAG6CVpXW0Gx6GfxUz_4_u9cGFJdt2gOcGsuphbP9YjkyYMYU2g@mail.gmail.com>
Message-ID: <CAG6CVpXW0Gx6GfxUz_4_u9cGFJdt2gOcGsuphbP9YjkyYMYU2g@mail.gmail.com>
Subject: Re: svn commit: r313006 - in head: sys/conf sys/libkern
 sys/libkern/x86 sys/sys tests/sys/kern
To: Bruce Evans <brde@optusnet.com.au>
Cc: src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Jan 2017 06:25:34 -0000

Hi Bruce,

On Mon, Jan 30, 2017 at 9:26 PM, Bruce Evans <brde@optusnet.com.au> wrote:
> On Tue, 31 Jan 2017, Conrad E. Meyer wrote:
>
>> Log:
>>  calculate_crc32c: Add SSE4.2 implementation on x86
>
>
> This breaks building with gcc-4.2.1,

gcc-4.2.1 is an ancient compiler.  Good riddance.

>> Added: head/sys/libkern/x86/crc32_sse42.c
>>
>> ==============================================================================
>> --- /dev/null   00:00:00 1970   (empty, because file is newly added)
>> +++ head/sys/libkern/x86/crc32_sse42.c  Tue Jan 31 03:26:32 2017
>> (r313006)
>> +
>> +#include <nmmintrin.h>
>
> ...
>
> Inline asm is much less unportable than intrinsics.  kib used the correct
> method of .byte's in asms to avoid depending on assembler support for newer
> instructions.  .byte is still used for clflush on amd64 and i386.  It
> used to be used for invpcid on amd64.  I can't find where it is or was
> used for xsave stuff.

Konstantin predicted this complaint in code review (phabricator).
Unfortunately, Clang does not automatically unroll asms, even with the
correct mnemonic.  Unrolling is essential to performance below the
by-3 block size (768 bytes in this implementation).  Hand unrolling in
C seems to generate less efficient assembly than the compiler's
unrolling.

The left column below is block size.  The measurements are nanoseconds
per buf, per CLOCK_VIRTUAL, averaged over 10^5 loops.  These numbers
do not vary more than +/- 1ns run to run on my idle Sandy Bridge
laptop.  "asm" is using __asm__(), "intrins" using the _mm_crc32
intrinsics that Clang can unroll, and multitable is the older
lookup-table implementation (still used on other architectures).

0x000010: asm:0 intrins:0 multitable:0  (ns per buf)
0x000020: asm:7 intrins:9 multitable:78  (ns per buf)
0x000040: asm:10 intrins:7 multitable:50  (ns per buf)
0x000080: asm:15 intrins:9 multitable:91  (ns per buf)
0x000100: asm:25 intrins:17 multitable:178  (ns per buf)
0x000200: asm:55 intrins:38 multitable:347  (ns per buf)
0x000400: asm:61 intrins:62 multitable:684  (ns per buf)

Both implementations are superior to the multitable approach, so it is
unreasonable not to make one of them standard on x86 platforms.

The unrolled intrinsics are consistently better than not unrolled on
objects 0x40-0x200 bytes large.  At 0x400 bytes we pass the first
unroll-by-3 threshold and it stops mattering as much.

At 0x40 bytes, it is the difference between 6.4 GB/s and 9.1 GB/s.  At
0x200 bytes, it is the difference between 9.3 GB/s and 13.5 GB/s.  I
think this justifies some minor ugliness.

Best,
Conrad