Date: Sat, 25 Oct 2003 16:07:11 -0700 From: Peter Wemm <peter@wemm.org> To: Jeff Roberson <jroberson@chesapeake.net> Cc: cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/i386 pmap.c Message-ID: <20031025230711.B20F92A7EA@canning.wemm.org> In-Reply-To: <20031025173220.D43805-100000@mail.chesapeake.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Jeff Roberson wrote: > On Sat, 25 Oct 2003, Peter Wemm wrote: > > > peter 2003/10/25 11:51:41 PDT > > > > FreeBSD src repository > > > > Modified files: > > sys/i386/i386 pmap.c > > Log: > > For the SMP case, flush the TLB at the beginning of the page zero/copy > > routines. Otherwise we run into trouble with speculative tlb preloads > > on SMP systems. This effectively defeats Jeff's revision 1.438 > > optimization (for his pentium4-M laptop) in the SMP case. It breaks > > other systems, particularly athlon-MP's. > > If the page tables are NULL why does this break speculative tlb preloads? While we're zeroing the page, CMAP2 (or friends) are non-NULL. If another cpu accesses a nearby page and the cpu decides to speculatively preload the nearby TLB entries, then it will cache the CMAP2 value. Meanwhile, the originating cpu clears it again and flushes its own cache. But, if we then do a pmap_zero_page on the other cpu, it can still have the speculatively cached tlb entry and zero the wrong page. Poul-Henning was able to reproduce this problem in short order. The first hack we tried was to change invlcaddr() to do a global shootdown. It solved the crashes.. presumably by purging all other cpu's copies of CMAP2 including any speculatively loaded values. Obviously this is expensive and defeats the point of doing local flushes only. So, as a lighter weight solution, we tried flushing after every page table modification, as the IA32 system programmers manual says we must, and it too solved the problem - without the expense of extra tlb shootdowns. Perhaps we should change back to using the the switchin purge and flush at the beginning as an alternative to two flushes. The expense of invlpg seems to be unique to the pentium-4's. athlon's run at about 100 clock cycles (80 on athlon64's). Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031025230711.B20F92A7EA>