From owner-freebsd-current@FreeBSD.ORG Thu Jan 24 02:55:05 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8B7D8F42 for ; Thu, 24 Jan 2013 02:55:05 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 4D755F95 for ; Thu, 24 Jan 2013 02:55:05 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 18F507300A; Thu, 24 Jan 2013 03:54:42 +0100 (CET) Date: Thu, 24 Jan 2013 03:54:42 +0100 From: Luigi Rizzo To: current@freebsd.org Subject: false alarm (Re: __builtin_memcpy() slower than memcpy/bcopy (and on linux it is the opposite) ?) Message-ID: <20130124025442.GA63831@onelab2.iet.unipi.it> References: <20130123163238.GB56212@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130123163238.GB56212@onelab2.iet.unipi.it> User-Agent: Mutt/1.4.2.3i X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2013 02:55:05 -0000 On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote: > Probably our compiler folks have some ideas on this... > > When doing netmap i found that on FreeBSD memcpy/bcopy was expensive, > __builtin_memcpy() was even worse, and so i ended up writing > my custom routine, (called pkt_copy() in the program below). > This happens with gcc 4.2.1, clang, gcc 4.6.4 > > I was then surprised to notice that on a recent ubuntu using > gcc 4.6.2 (if that matters) the __builtin_memcpy beats other > methods by a large factor. so, it turns out that in my test program I had swapped the source and destination operands for __builtin_memcpy(), and this substantially changed the memory access pattern. With the correct operands, __builtin_memcpy == memcpy == bcopy on both FreeBSD and Linux. On FreeBSD pkt_copy is still faster than the other methods for small packets, whereas on Linux they are equivalent. If you are curious why swapping source and dst changed things so dramatically: the test was supposed to read from a large chunk of memory (over 1GB) to avoid always hitting L1 or L2. Swapping operands causes reads to hit always the same line, thus saving a lot of misses. The difference between the two machine then probably is due to how the cache is used on writes. sorry for the noise. luigi