From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 12 12:58:03 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ACF1D16A4B3 for ; Sun, 12 Oct 2003 12:58:03 -0700 (PDT) Received: from arginine.spc.org (arginine.spc.org [195.206.69.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9AF4143F3F for ; Sun, 12 Oct 2003 12:58:01 -0700 (PDT) (envelope-from bms@spc.org) Received: from localhost (localhost [127.0.0.1]) by arginine.spc.org (Postfix) with ESMTP id 6F4406542A; Sun, 12 Oct 2003 20:58:00 +0100 (BST) Received: from arginine.spc.org ([127.0.0.1]) by localhost (arginine.spc.org [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 07496-01-2; Sun, 12 Oct 2003 20:57:59 +0100 (BST) Received: from saboteur.dek.spc.org (82-147-18-81.dsl.uk.rapidplay.com [82.147.18.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by arginine.spc.org (Postfix) with ESMTP id E9B0A653AC; Sun, 12 Oct 2003 20:57:53 +0100 (BST) Received: by saboteur.dek.spc.org (Postfix, from userid 1001) id 1C9DE55; Sun, 12 Oct 2003 20:57:52 +0100 (BST) Date: Sun, 12 Oct 2003 20:57:52 +0100 From: Bruce M Simpson To: Peter Jeremy , Andrew Gallatin , freebsd-hackers@freebsd.org Message-ID: <20031012195752.GE2996@saboteur.dek.spc.org> Mail-Followup-To: Peter Jeremy , Andrew Gallatin , freebsd-hackers@freebsd.org References: <20031010103640.6F5A216A4BF@hub.freebsd.org> <20031010134400.GE803@saboteur.dek.spc.org> <16263.1019.939450.708832@grasshopper.cs.duke.edu> <20031011035827.GD75796@server.c211-28-27-130.belrs2.nsw.optusnet.com.au> <20031011082711.GB679@saboteur.dek.spc.org> <20031011101231.GH75796@server.c211-28-27-130.belrs2.nsw.optusnet.com.au> <20031011140651.GA1739@saboteur.dek.spc.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="gBBFr7Ir9EOA20Yy" Content-Disposition: inline In-Reply-To: <20031011140651.GA1739@saboteur.dek.spc.org> Subject: Re: Determining CPU features / cache organization from userland X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Oct 2003 19:58:03 -0000 --gBBFr7Ir9EOA20Yy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline All, I came up with the attached text file today to summarize some of my findings, after looking at various open source trees to see how they handle run-time cache geometry detection. Many will find it ironic that i386 is the easiest platform to deal with. [ Andrew: Perhaps you can shed some light on how the necessary information can be gathered on Alpha? My search was incomplete and I could not find a reliable source for DEC's development manuals. ] Jeff Roberson suggested I adopt NetBSD's API, however, on further examination it's clear that NetBSD's approach isn't consistent across all platforms. Darwin takes a similar approach, but it is perhaps too PowerPC-centric. sysctl is a good interface for retrieving this information as it doesn't change during the lifetime of the kernel, and it is small. sysctl is already invoked from within libc to retrieve information in this way. glibc's approach to dealing with situations where knowledge of the cache line size is needed is a bit fractious - it retrieves the information from an 'aux vector' passed to glibc at startup. I think threading libraries should seriously consider becoming consumers of the API once it's finalized. Mutex alignment on cache line boundaries is desirable for userland applications too. However, phk malloc would need to be changed in order to support this specific form of aligned allocation. Perhaps a separate pool or zone could be used for this kind of allocation? This becomes more important and timely when one considers the I/O alignment restrictions we've encountered. Some applications may need to align their buffers on arbitrary boundaries to suit devices, too. BMS --gBBFr7Ir9EOA20Yy Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="cacheplan.txt" all --- NetBSD cache information API(s) are not consistent across platforms. alpha ----- Cache discovery? Static. 21064, 21064A, 21066, 21066A, 21164 all have line sizes of 32-bytes. The 21264 has a 64-byte line size. 21364: L1 split, 64KB each, 2-way set-associative, Virtual caches can be implemented using PALcode, but this is probably more of a curiosity than anything else. ia64 ---- Cache discovery? Call PAL_CACHE_INFO, I think. No documentation on how to do this at this time. I have emailed marcel@freebsd.org asking for advice. i386 pc98 amd64 --------------- Cache discovery? CPUID. Earlier chips which don't support it probably don't have a cache, or aren't worth supporting. General rule for x86: split L1, unified L2, optional unified L3. General rule for Intel P5: 2-way, 32 bytes/line General rule for Intel MMX and up: 4-way, 32 bytes/line PPro doesn't have L3. The newer cores have different cache geometry. powerpc ------- Cache line discovery? Static. Many core variants. I have not seen any runtime code for this. The POWER clcs instruction is obsolete. OpenDarwin assumes 32-bytes. It has hooks for discovering the cache geometry at runtime but these are not used. NetBSD statically initializes this information according to the discovered CPU model in use, which is the way to go. NetBSD tells uvm to recolor the page queues if required. Linux uses static #define's from IBM people, except in the case of ppc64, which is strikingly similar to the OpenDarwin code except it actually talks to the open firmware. Open Firmware on CHRP should however provide the following for each cpu device node configured in the system: i-cache-size i-cache-sets i-cache-block-size d-cache-size d-cache-sets d-cache-block-size tlb-size tlb-sets l2-cache All are integers except for l2-cache which is the address of an l2-cache device node if the system found one. mips ---- The NetBSD MIPS code for dealing with cache geometry was recently updated. MIPS caches may be split/unified at L1/L2 and unified at L3. Cache detection code is quite voluminous. Swipe NetBSD's if FreeBSD/mips ever kicks off. Many, many core variants. sparc64 ------- Cache line discovery? Performed by Open Firmware. Open Firmware property names used are ever so slightly different from Apple's. icache-size icache-line-size icache-associativity dcache-size dcache-line-size dcache-associativity ecache-size ecache-line-size ecache-associativity Already handled within cache.c, but assembly stubs *expect* this information in a certain format. Specifically they need to see the data cache/instruction cache sizes and line sizes. General rule: Split L1, Unified L2. Cores: Spitfire/Blackbird/Cheetah --gBBFr7Ir9EOA20Yy--