From owner-freebsd-current@freebsd.org Thu Jan 4 10:03:42 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EE11EA8BD1 for ; Thu, 4 Jan 2018 10:03:42 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from theravensnest.org (xvm-110-62.dc2.ghst.net [46.226.110.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "theravensnest.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F14212A00; Thu, 4 Jan 2018 10:03:41 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from [192.168.1.65] (host86-154-8-90.range86-154.btcentralplus.com [86.154.8.90]) (authenticated bits=0) by theravensnest.org (8.15.2/8.15.2) with ESMTPSA id w04A3XQg070523 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 4 Jan 2018 10:03:33 GMT (envelope-from theraven@FreeBSD.org) X-Authentication-Warning: mail: Host host86-154-8-90.range86-154.btcentralplus.com [86.154.8.90] claimed to be [192.168.1.65] Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Programmatically cache line From: David Chisnall In-Reply-To: <35d2d373-92f1-499f-f470-e4528b08b937@freebsd.org> Date: Thu, 4 Jan 2018 10:03:32 +0000 Cc: freebsd-current@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <71E8D6E7-F833-4B7E-B1F1-AD07A49CAF98@FreeBSD.org> References: <20171230082812.GL1684@kib.kiev.ua> <08038E36-9679-4286-9083-FCEDD637ADCC@FreeBSD.org> <20180101103655.GF1684@kib.kiev.ua> <35d2d373-92f1-499f-f470-e4528b08b937@freebsd.org> To: Nathan Whitehorn X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Jan 2018 10:03:42 -0000 On 3 Jan 2018, at 22:12, Nathan Whitehorn = wrote: >=20 > On 01/03/18 13:37, Ed Schouten wrote: >> 2018-01-01 11:36 GMT+01:00 Konstantin Belousov : >>>>>> On x86, the CPUID instruction leaf 0x1 returns the information in >>>>>> %ebx register. >>>>> Hm, weird. Why don't we extend sysctl to include this info? >>> For the same reason we do not provide a sysctl to add two integers. >> I strongly agree with Kostik on this one. Why add stuff to the = kernel, >> if userspace is already capable of extracting this? Adding that stuff >> to sysctl has the downside that it will effectively introduce yet >> another FreeBSDism, whereas something generic already exists. >>=20 >=20 > Well, kind of. The userspace version is platform-dependent and not = always available: for example, on PPC, you can't do this from userland = and we provide a sysctl machdep.cacheline_size to userland. It would be = nice to have an MI API. On ARMv8, similarly, sometimes the kernel needs to advertise the wrong = size. A few big.LITTLE cores have 64-byte cache lines on one cluster = and 32-byte on the other. If you query the size from userspace while = running on a 64-byte cluster, then issue the zero-cache-line instruction = while migrated to the 32-byte cluster, you only clear half the size. = Linux works around this by trapping and emulating the instruction to = query the cache size and always reporting the size for the smallest = cache lines. ARM tells people not to build systems like this, but it = doesn=E2=80=99t always stop them. Trapping and emulating is much slower = than just providing the information in a shared page, elf aux args = vector, or even (often) a system call. To give another example, Linux provides a very cheap way for a userspace = process to enquire which core it=E2=80=99s running on. Some more recent = high-performance mallocs use this to have a second-layer per-core cache = after the per-thread cache for free blocks. Unlike the per-thread = cache, the per-core cache does need a lock, but it=E2=80=99s very = unlikely to be contended (it will only be contended if either a thread = is migrated in between checking and locking, so acquires the wrong = CPU=E2=80=99s lock, or if a thread is preempted in the middle of middle = of the very brief fill operation). The author of the SuperMalloc paper = tried doing this with CPUID and found that it was slower by a sufficient = margin to almost entirely offset the benefits of the extra layer of = caching. =20 Just because userspace can get at the information directly from the = hardware doesn=E2=80=99t mean that this is the most efficient or best = way for userspace to get at it. Oh, and some of these things are useful in portable code, so having to = write some assembly for every target to get information that the kernel = already knows is wasteful. David