FreeBSD Mail Archives

Date:      Thu, 24 Jan 2013 03:54:42 +0100
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        current@freebsd.org
Subject:   false alarm (Re: __builtin_memcpy() slower than memcpy/bcopy (and on linux it is the opposite) ?)
Message-ID:  <20130124025442.GA63831@onelab2.iet.unipi.it>
In-Reply-To: <20130123163238.GB56212@onelab2.iet.unipi.it>
References:  <20130123163238.GB56212@onelab2.iet.unipi.it>

On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote:
> Probably our compiler folks have some ideas on this...
> 
> When doing netmap i found that on FreeBSD memcpy/bcopy was expensive,
> __builtin_memcpy() was even worse, and so i ended up writing
> my custom routine, (called pkt_copy() in the program below).
> This happens with gcc 4.2.1, clang, gcc 4.6.4
> 
> I was then surprised to notice that on a recent ubuntu using
> gcc 4.6.2 (if that matters) the __builtin_memcpy beats other
> methods by a large factor.

so, it turns out that in my test program I had swapped the
source and destination operands for __builtin_memcpy(), and
this substantially changed the memory access pattern.

With the correct operands, __builtin_memcpy == memcpy == bcopy
on both FreeBSD and Linux.
On FreeBSD pkt_copy is still faster than the other methods for
small packets, whereas on Linux they are equivalent.

If you are curious why swapping source and dst changed things
so dramatically:

the test was supposed to read from a large chunk of
memory (over 1GB) to avoid always hitting L1 or L2.
Swapping operands causes reads to hit always the same line,
thus saving a lot of misses. The difference between the two
machine then probably is due to how the cache is used on writes.

sorry for the noise.
luigi

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130124025442.GA63831>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation