From owner-svn-src-all@freebsd.org Tue Jan 31 06:25:34 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E3A0CC90A0; Tue, 31 Jan 2017 06:25:34 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F3922164F; Tue, 31 Jan 2017 06:25:33 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-wm0-f43.google.com with SMTP id b65so63717249wmf.0; Mon, 30 Jan 2017 22:25:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=gnxo8+rR5kwSlYgMLWIjuHqxzaEAdZiU7x1v2/YWlh0=; b=TUFBK2GAtMK+QAtiBO9Men2WBdsOMzJUHFdo3K75ona4uBaRm3+IH4JdGpICHo9WQY vG9Nfr2r+tif+5eB+m3nUdM23z2JHr7DO6CQP1zhGnX7Dv754qv0DNSwpi5xBL6HJq9K CEz0ytKH2GlD4qM6jonTKJ1VAJujy6OBiPBPw3ZQn6DHpIYyTGjHGomDgzok6TfPOBoq p786VldKuqNmkW8jhkvZUGkx0ykU9oV9GYSOAYOCHtn+ap04PhBS5jVEFRstBPtEPNPt sOgASTyGaAKVJ6Dhe3whpIYmTGYVKuL38pS4c1cATzdNkkx0nieTNKE2WMROgBuHDSYV jSHQ== X-Gm-Message-State: AIkVDXKZIkgpIY3/rrKQyKHnjIGXaoMZBHNRdC4qxZbprkJmOGoVJeXKOse8OZURY76rQg== X-Received: by 10.223.148.35 with SMTP id 32mr25568575wrq.18.1485843446537; Mon, 30 Jan 2017 22:17:26 -0800 (PST) Received: from mail-wm0-f48.google.com (mail-wm0-f48.google.com. [74.125.82.48]) by smtp.gmail.com with ESMTPSA id m188sm20693446wma.0.2017.01.30.22.17.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Jan 2017 22:17:26 -0800 (PST) Received: by mail-wm0-f48.google.com with SMTP id b65so63494606wmf.0; Mon, 30 Jan 2017 22:17:26 -0800 (PST) X-Received: by 10.223.173.183 with SMTP id w52mr26223441wrc.164.1485843446226; Mon, 30 Jan 2017 22:17:26 -0800 (PST) MIME-Version: 1.0 Reply-To: cem@freebsd.org Received: by 10.194.22.42 with HTTP; Mon, 30 Jan 2017 22:17:25 -0800 (PST) In-Reply-To: <20170131153411.G1061@besplex.bde.org> References: <201701310326.v0V3QW30024375@repo.freebsd.org> <20170131153411.G1061@besplex.bde.org> From: Conrad Meyer Date: Mon, 30 Jan 2017 22:17:25 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r313006 - in head: sys/conf sys/libkern sys/libkern/x86 sys/sys tests/sys/kern To: Bruce Evans Cc: src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2017 06:25:34 -0000 Hi Bruce, On Mon, Jan 30, 2017 at 9:26 PM, Bruce Evans wrote: > On Tue, 31 Jan 2017, Conrad E. Meyer wrote: > >> Log: >> calculate_crc32c: Add SSE4.2 implementation on x86 > > > This breaks building with gcc-4.2.1, gcc-4.2.1 is an ancient compiler. Good riddance. >> Added: head/sys/libkern/x86/crc32_sse42.c >> >> ============================================================================== >> --- /dev/null 00:00:00 1970 (empty, because file is newly added) >> +++ head/sys/libkern/x86/crc32_sse42.c Tue Jan 31 03:26:32 2017 >> (r313006) >> + >> +#include > > ... > > Inline asm is much less unportable than intrinsics. kib used the correct > method of .byte's in asms to avoid depending on assembler support for newer > instructions. .byte is still used for clflush on amd64 and i386. It > used to be used for invpcid on amd64. I can't find where it is or was > used for xsave stuff. Konstantin predicted this complaint in code review (phabricator). Unfortunately, Clang does not automatically unroll asms, even with the correct mnemonic. Unrolling is essential to performance below the by-3 block size (768 bytes in this implementation). Hand unrolling in C seems to generate less efficient assembly than the compiler's unrolling. The left column below is block size. The measurements are nanoseconds per buf, per CLOCK_VIRTUAL, averaged over 10^5 loops. These numbers do not vary more than +/- 1ns run to run on my idle Sandy Bridge laptop. "asm" is using __asm__(), "intrins" using the _mm_crc32 intrinsics that Clang can unroll, and multitable is the older lookup-table implementation (still used on other architectures). 0x000010: asm:0 intrins:0 multitable:0 (ns per buf) 0x000020: asm:7 intrins:9 multitable:78 (ns per buf) 0x000040: asm:10 intrins:7 multitable:50 (ns per buf) 0x000080: asm:15 intrins:9 multitable:91 (ns per buf) 0x000100: asm:25 intrins:17 multitable:178 (ns per buf) 0x000200: asm:55 intrins:38 multitable:347 (ns per buf) 0x000400: asm:61 intrins:62 multitable:684 (ns per buf) Both implementations are superior to the multitable approach, so it is unreasonable not to make one of them standard on x86 platforms. The unrolled intrinsics are consistently better than not unrolled on objects 0x40-0x200 bytes large. At 0x400 bytes we pass the first unroll-by-3 threshold and it stops mattering as much. At 0x40 bytes, it is the difference between 6.4 GB/s and 9.1 GB/s. At 0x200 bytes, it is the difference between 9.3 GB/s and 13.5 GB/s. I think this justifies some minor ugliness. Best, Conrad