From owner-freebsd-ia32@FreeBSD.ORG Mon Dec 11 16:31:16 2006 Return-Path: X-Original-To: freebsd-ia32@freebsd.org Delivered-To: freebsd-ia32@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2FB0516A47B for ; Mon, 11 Dec 2006 16:31:16 +0000 (UTC) (envelope-from ranjith_kumar_b4u@yahoo.com) Received: from rrr2-v2.mail.re1.yahoo.com (rrr2-v2.mail.re1.yahoo.com [66.196.101.127]) by mx1.FreeBSD.org (Postfix) with SMTP id D16A843CC9 for ; Mon, 11 Dec 2006 16:29:50 +0000 (GMT) (envelope-from ranjith_kumar_b4u@yahoo.com) Received: (qmail 20498 invoked from network); 11 Dec 2006 16:05:22 -0000 Received: from web58616.mail.re3.yahoo.com (68.142.236.250) by rrr2-v2.mail.re1.yahoo.com with SMTP; 11 Dec 2006 16:05:22 -0000 Received: (qmail 13931 invoked by uid 60001); 11 Dec 2006 16:05:22 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=5FwI4FntFna8Fm5ubSfs1RjWEmL8oVqLFJ65eMA2ap76Gv2YvwAAFHV9TDjsdIQ2eVnNH8lUyTyPhtnAI+ZvOylu7xgZMSlWPWdl05ScmhZ5XR5oUXsQN0OfTm7ZVS/IihbNwTnEPf9cppMKzsgnbQr5OuhP7Ha3P29EA3nHi+w=; X-YMail-OSG: EgUdHRsVM1l_HIntkEKJcPKX5C3HV910EgqnyCgX Received: from [202.68.145.230] by web58616.mail.re3.yahoo.com via HTTP; Mon, 11 Dec 2006 08:05:21 PST Date: Mon, 11 Dec 2006 08:05:21 -0800 (PST) From: ranjith kumar To: freebsd-ia32@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <234144.12400.qm@web58616.mail.re3.yahoo.com> X-Mailman-Approved-At: Mon, 11 Dec 2006 19:27:08 +0000 Subject: Re: prefetching on pentium 4 X-BeenThere: freebsd-ia32@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD on the IA-32 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Dec 2006 16:31:16 -0000 --- Attilio Rao wrote: > 2006/12/6, ranjith kumar > : > > Hi, > > There are 4 types of prefetch instructions on > > pentium 4 (IA-32) processor. > > prefetchnta,prefetcht0,prefetcht1,prefetcht2. > > > > In case of pentium 4, IA-32 otimization manuvals > say > > that prefetcht0,prefetcht1,prefetcht2 are > identical. > > > > It also says ONLY prefetchnta instruction > prefetches > > data into L2 cache without poluting caches. > > > > When all the four instructions prefetches data > into > > L2 cache (not into L1 cache) , what is the meaning > in > > saying prefetchnta does not polute caches? > > > > ie)what is the difference between prefetchnta and > > other instructions? > > First of all, it is important to say that prefetch* > instruction is > only an hint for the CPU and not a *command* for > that, so the CPU > needs to evaluate (in a not precisated way) if > accept or not the > caching request. > From this point of view, prefetch* instruction might > be the more > accomodant possible for the CPU. > Different numbers mean different 'critical' level > for the CPU (0 - > high critical, 2 - low critical), which means > prefetching the cache > line to an higher level into the cache hierarchy. > This would means, in an hypotetical way: > > prefetch0 -> L1 prefetching > prefetch1 -> L2 prefetching > prefetch2 -> L3 prefetching > > And this is what really happens, for example, on P3 > (if you consider > P3 has not L3 cache, prefetch2 == prefetch1). > On P4 things are different beacause you would not > manipulate directly > L1 cache and, so, what happens is: > > prefetch0 -> L2 prefetching > prefetch1 -> L2 prefetching > prefetch2 -> L3 prefetching > (if L3 cache is not present prefetch2 is the same as > the other, from > this the assumption all the three instructions > behave at the same). > > prefetchnta is completely different beacause it > fetches a cache line > into the NT cache structure. > Non Temporal caches are global caches which are > particulary powerful > beacause they don't need of snooping messages > between CPUs (and, in > this way, they reduce the CPUs<->caches traffic) and > are used by NTI > family. Thanks. But when I executed two programs one prefetching using prefetchnta and the second using prefetcht0, the second program executed faster. (I used pentium4 processor and gcc compiler.)What could be the reason?When prefechnta is preferable over prefecht0? Also in "IA-32 systems programmers manual" nothing about nontemporal cache structure is written.The caches in IA-32 processors are L1 cache, L2 cache,write-combing cache,store buffer, instruction TLB and data TLB and L3 cache(not present in pentium4). Does non temporal cache and write combining buffer are same? Thanks in advance. > > Attilio > > > -- > Peace can only be achieved by understanding - A. > Einstein > ____________________________________________________________________________________ Want to start your own business? Learn how on Yahoo! Small Business. http://smallbusiness.yahoo.com/r-index