Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jan 2013 03:54:42 +0100
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        current@freebsd.org
Subject:   false alarm (Re: __builtin_memcpy() slower than memcpy/bcopy (and on linux it is the opposite) ?)
Message-ID:  <20130124025442.GA63831@onelab2.iet.unipi.it>
In-Reply-To: <20130123163238.GB56212@onelab2.iet.unipi.it>

index | next in thread | previous in thread | raw e-mail

On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote:
> Probably our compiler folks have some ideas on this...
> 
> When doing netmap i found that on FreeBSD memcpy/bcopy was expensive,
> __builtin_memcpy() was even worse, and so i ended up writing
> my custom routine, (called pkt_copy() in the program below).
> This happens with gcc 4.2.1, clang, gcc 4.6.4
> 
> I was then surprised to notice that on a recent ubuntu using
> gcc 4.6.2 (if that matters) the __builtin_memcpy beats other
> methods by a large factor.

so, it turns out that in my test program I had swapped the
source and destination operands for __builtin_memcpy(), and
this substantially changed the memory access pattern.

With the correct operands, __builtin_memcpy == memcpy == bcopy
on both FreeBSD and Linux.
On FreeBSD pkt_copy is still faster than the other methods for
small packets, whereas on Linux they are equivalent.

If you are curious why swapping source and dst changed things
so dramatically:

the test was supposed to read from a large chunk of
memory (over 1GB) to avoid always hitting L1 or L2.
Swapping operands causes reads to hit always the same line,
thus saving a lot of misses. The difference between the two
machine then probably is due to how the cache is used on writes.

sorry for the noise.
luigi


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130124025442.GA63831>