From owner-freebsd-current@FreeBSD.ORG Thu Jun 17 02:45:18 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 53E1E16A4CE for ; Thu, 17 Jun 2004 02:45:18 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id CD4CE43D54 for ; Thu, 17 Jun 2004 02:45:17 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i5H2jF5v012195; Thu, 17 Jun 2004 12:45:15 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i5H2jAaU002446; Thu, 17 Jun 2004 12:45:12 +1000 Date: Thu, 17 Jun 2004 12:45:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Martin Nilsson In-Reply-To: <40D016F2.2080904@gneto.com> Message-ID: <20040617120121.O1115@gamplex.bde.org> References: <20040616112758.46677e25@Magellan.Leidinger.net> <40D016F2.2080904@gneto.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Alexander Leidinger cc: current@freebsd.org Subject: Re: How to determine the L2 cache size on non-AMD CPUs (automatic page queue color tuning)? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jun 2004 02:45:18 -0000 On Wed, 16 Jun 2004, Martin Nilsson wrote: > Alexander Leidinger wrote: > > Now I need to know how to determine those properties on at least some > > Intel CPUs (e.g. P3 & P4). > > The more expensive intel processors also have L3 caches of 1-4MB. > Since intels processors are built with inclusive caches (data in L2 > cache is also present in L3) shouldn't the value used be that of the > largest cache be it L2 or L3? > > How much effct on performance does a wrong cache size value have? Closer to 0.1% than to 10%. The whole page coloring optimization was worth a few percent at best except in unusual/unlucky cases when it was first implemented, which was when hardware caches mostly had less associativity. Without explict page coloring, the colors of pages assign to an object are almost random. This causes unnecessary cache conflicts, but random allocation isn't too bad and on average only gives a small number of cache conflicts which are compensated for by associativity. The effects of coloring are easiest to see in microbenchmarks. E.g.: %%% L M B E N C H 2 . 0 S U M M A R Y ------------------------------------ *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- bes4.bde. FreeBSD 5.0-C 1509 757. 223. 514.2 923.0 373.2 372.2 742. 648.2 besplex.b FreeBSD 5.0-C 1531 736. 285. 527.2 922.1 417.9 420.2 741. 781.4 besplex.b Linux 2.4.0-t 962. 657. 731. 533.4 928.5 387.1 388.0 789. 687.6 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- ---- ----- ------ -------- ------- bes4.bde. FreeBSD 5.0-C 1533 1.958 13.1 98.5 besplex.b FreeBSD 5.0-C 1533 1.957 13.1 98.5 besplex.b Linux 2.4.0-t 1533 1.957 13.1 111.6 %%% This is on an Athlon XP1600 overclocked by 146/133 with 2*256MB DDR PC2100 memory, running fairly old kernels. The Linux "Main mem" latency is lower entirely because Linux at least Linux-2.4.0-test.mumble doesn't implemtent page coloring. bes4 is running plain -current and besplex is running my version of -current which has finely tuned page (actually tuned for a Celeron, not for the Athlon) and extra color bits corresponding to the bank organization (tuned for both a Celeron and the Athlon). The besplex "Mem write" bandwidth is faster entirely because of the coloring for banks. This optimization has little effect for reads. I don't know why Linux is faster for "Mem read" and faster than bes4 for "Mem write". The bcopy bandwidths show the same optimizations as the read/write bandwidths. The other bandwidths are determined more by software than by memory speed or color. All of the numbers shown in the above except the TCP bandwith have a low variance (something in the TCP bandwidth benchmark or FreeBSD's handling of it give a high variance and often a low perfermance under FreeBSD). The bank coloring optimization is worth less than 1% for makeworld although it is worth 20% here. Bruce