Date: Wed, 17 Jan 2007 15:50:41 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Ivan Voras <ivoras@fer.hr> Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs Message-ID: <20070117134022.V18339@besplex.bde.org> In-Reply-To: <eojok9$449$1@sea.gmane.org> References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> <3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com> <3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com> <3bbf2fe10701160851r79b04464m2cbdbb7f644b22b6@mail.gmail.com> <20070116154258.568e1aaf@pleiades.nextvenue.com> <b1fa29170701161355lc021b90o35fa5f9acb5749d@mail.gmail.com> <eoji7s$cit$2@sea.gmane.org> <b1fa29170701161425n7bcfe1e5m1b8c671caf3758db@mail.gmail.com> <eojlnb$qje$1@sea.gmane.org> <b1fa29170701161534n1f6c3803tbb8ca60996d200d9@mail.gmail.com> <eojok9$449$1@sea.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 17 Jan 2007, Ivan Voras wrote: > Kip Macy wrote: >>> Maybe even someone finds a way to get optimized versions of memcpy in >>> the kernel :) > >> It makes a huge difference in a proprietary file serving appliance >> that I know of. > > Beneficial difference? Heheh. >> However, past measurements on FreeBSD have supposedly >> indicated that it isn't that big win as a result of increased context >> switch time. No, they indicated that the win is not very large (sometimes negative), and is very machine dependent. E.g., it is a small pessimization all 64 bit i386's running 64-bit mode -- that's just all i386's you would want to buy now. On other CPU classes: P2 (my old Celeron): +- epsilon difference P3 (freefall): +- epsilon difference P4 (nosedive's Xeon): movdqa 17% faster than movsl, but all other cached moves slower using MMX or SSE[1-2]; movnt with block prefetch 60% faster than movsl with no prefetch, but < 5% faster with no prefetch for both. AXP: (my 5 year old system with a newer CPU): movq through MMX is 60% faster than movsl for cached moves, but movdqa through XMM is only 4% faster. movnt with block prefetch is 155% faster than movsl with no prefetch, and 73% faster with no prefetch for both. A64 in 32-bit mode: in between P4 and AXP (closer to AXP). movsl doesn't lose by so much, and prefetchnta actually works so block prefetch is not needed and there is a better chance of prefetching helping more than benchmarks. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070117134022.V18339>