Date: Thu, 24 Jan 2013 03:54:42 +0100 From: Luigi Rizzo <rizzo@iet.unipi.it> To: current@freebsd.org Subject: false alarm (Re: __builtin_memcpy() slower than memcpy/bcopy (and on linux it is the opposite) ?) Message-ID: <20130124025442.GA63831@onelab2.iet.unipi.it> In-Reply-To: <20130123163238.GB56212@onelab2.iet.unipi.it>
index | next in thread | previous in thread | raw e-mail
On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote: > Probably our compiler folks have some ideas on this... > > When doing netmap i found that on FreeBSD memcpy/bcopy was expensive, > __builtin_memcpy() was even worse, and so i ended up writing > my custom routine, (called pkt_copy() in the program below). > This happens with gcc 4.2.1, clang, gcc 4.6.4 > > I was then surprised to notice that on a recent ubuntu using > gcc 4.6.2 (if that matters) the __builtin_memcpy beats other > methods by a large factor. so, it turns out that in my test program I had swapped the source and destination operands for __builtin_memcpy(), and this substantially changed the memory access pattern. With the correct operands, __builtin_memcpy == memcpy == bcopy on both FreeBSD and Linux. On FreeBSD pkt_copy is still faster than the other methods for small packets, whereas on Linux they are equivalent. If you are curious why swapping source and dst changed things so dramatically: the test was supposed to read from a large chunk of memory (over 1GB) to avoid always hitting L1 or L2. Swapping operands causes reads to hit always the same line, thus saving a lot of misses. The difference between the two machine then probably is due to how the cache is used on writes. sorry for the noise. luigihome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130124025442.GA63831>
