From owner-svn-src-head@freebsd.org Thu May 31 15:22:28 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6958BEF7427; Thu, 31 May 2018 15:22:28 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-qt0-x244.google.com (mail-qt0-x244.google.com [IPv6:2607:f8b0:400d:c0d::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 032256BE38; Thu, 31 May 2018 15:22:28 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-qt0-x244.google.com with SMTP id d3-v6so28275607qtp.11; Thu, 31 May 2018 08:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=I89NyOoix1XNxzbd134qrKy+lwlJkFrjR1K+FlgJCw4=; b=F9RBUVDO7MwxEuqbzxoURjz8EGTs7SFcxpwv9Jo7aBnN7sv42C93X/WDFEihuoGU6q PE0HLsGgsSBGVKEc4WuYPE1VZnbq2ywxsjzAXNYk+7jm9oDb2feEVigxAzdkpfVUKuQn mo28toghd0/903N27m4eydU8ABSMtxHFU02DshL94gHmwPONpJFd6JjSlIDmzHTQK9hj a+Jx2uLhYfkw9zBavGR6js6NsVvUfzMz36YiNXDylLqlb62ZR97AomHs/3h76LFvxWjR pENW5GKdm3ewkRJ7BzHPEb82JpytnX0WXHLnwVhXiForRrGY9O/Mk5KsWOemAAz1i5er LRsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=I89NyOoix1XNxzbd134qrKy+lwlJkFrjR1K+FlgJCw4=; b=ahW9IEovaGgi/P9DMgoKptRmhTjefxzuVOUKMNfFrIJ28uEiWWZsE5Obd9VyFAU8ZF ocIyfQQCyFvPm11A2UE2tUEpaYTZDXV/C8yJ7P0Q4q4Qc7tvIF+Ebbwxv9HNUL17T79t KOzgnMeCWYfjevTGSr9rDaYsH5Vz5mZqAdiB9s02I7q+k+Nep6ogql6SJ1jDGszffPIU QFqGKDVqNUFLnVYNGeZiVIiQSuxxzUZizcJsaOItyAQke55He6IC2eh4Vg14ZWKTck98 wHhspfu9frr+TFnjq/51+6jwa3qnRd0ybz9S60/TrpQu9t1aCwv5RKi06dbIkzVCMpvL uXYg== X-Gm-Message-State: APt69E1RnkqJvXRSqDRu55OLgFtT324JOMEhl8MCQxex7S0g5UsEci6T i68mxShZf8DjHBkdxFWdC91dDeRQgN4yGgNkZ5I= X-Google-Smtp-Source: ADUXVKIL5tdr0OLgh/7PxHCjeSUKCOTWG0xpReV8T/JQSFDWvFDhoZh0pOHgbPuFfLni/wp4lFiiM18IxM8irSqIbqY= X-Received: by 2002:aed:374a:: with SMTP id i68-v6mr7137779qtb.129.1527780147678; Thu, 31 May 2018 08:22:27 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac8:1c4e:0:0:0:0:0 with HTTP; Thu, 31 May 2018 08:22:27 -0700 (PDT) In-Reply-To: <20180531201225.L2478@besplex.bde.org> References: <201805310956.w4V9u2rL084194@repo.freebsd.org> <20180531201225.L2478@besplex.bde.org> From: Mateusz Guzik Date: Thu, 31 May 2018 17:22:27 +0200 Message-ID: Subject: Re: svn commit: r334419 - head/sys/amd64/amd64 To: Bruce Evans Cc: Mateusz Guzik , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 May 2018 15:22:28 -0000 On Thu, May 31, 2018 at 09:19:58PM +1000, Bruce Evans wrote: > On Thu, 31 May 2018, Mateusz Guzik wrote: > > > Log: > > amd64: switch pagecopy from non-temporal stores to rep movsq > > As for pagezero, this pessimizes for machines with slow movsq and/or caches > (mostly older machines). > Can you give examples of such machines? I tested with old yellers like Nehalem and Westmere, no loss. > > The copied data is accessed in part soon after and it results with additional > > cache misses during a -j 1 buildkernel WITHOUT_CTF=yes KERNFAST=1, as measured > > with pmc stat. > > Of course it causes more cache misses later, but for large data going through > slow caches is much slower so the cache misses later cost less. > The note was predominantly for people who would want to defend nt stores claiming it prevents evicting cached data by data being copied and then mostly not accessed. As for speed diff, see above. > It is negatively useful to write this in asm. This is now just memcpy() > and the asm version of that is fast enough, though movsq takes too long > to start up. This memcpy() might be inlined and then it would be > insignificantly faster than the function call. __builtin_memcpy() won't > actually inline it, since its size is large and compilers know that they > don't understand memory. > It is true that currently it can be the current memcpy with almost no loss. However, even on a kernel with #define memcpy __builtin_memcpy, there are plenty of calls with very small sizes. See the list here (taken during buildkernel): https://people.freebsd.org/~mjg/bufsizes.txt In particular you can find a lot of < 64 entries. Spinning up rep stosb for such sizes even with ERMS turns out to be pessimal even on Skylake. In other words, the primitive will need to get special casing for small-sized callers. Known big-size callers should be moved to something else. As such, pointing pagecopy at the primitive is imo a bad idea. As was noted elsewhere the current ifunc support has an undesirable property of generating indirect calls. Whatever happens next (this gets fixed or abandoned perhaps), there will be a way to select appropriate routines at boot time. If you know of specific amd64 microarchs which benefit from nt stores in either pagzero or pagecopy, we can just special case them later. In the meantime, I find the current change to be in the right direction. -- Mateusz Guzik