Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Mar 1995 19:40:49 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        freebsd-hackers@freefall.cdrom.com, kuku@gilberto.physik.rwth-aachen.de
Subject:   Re: fast string inline routines (asm)
Message-ID:  <199503230940.TAA09637@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>In the djgpp list a discussion came up recently about inlining (asm)
>fast memcpy/memmove/strcpy and such stuff and someone pointed out that
>Linux had these - I cite from Mat Hostetter:

>"Subject: Re: A quick way to copy n bytes

>NOTE:  if people want to see some good implementations of these routines,
>       you should check out the inline asm versions in the Linux headers,
>       e.g. linux/asm/string.h.  They are impressive."

FreeBSD already has fairly good implementations.

>I wonder if FreeBSD can have these too.

I checked the Linux-1.2.0 sources:

Everything is `extern inline void'.  Inlining may or may not be good.
It eliminates function call overhead.  It increases register pressure
(not good).  It may deplete the cache.

Almost everything uses the i*86 string functions (so do the FreeBSD
versions).  This is good for most values of `*' and most string
functions, but not for memcpy on i486's.  In practice it doesn't
matter a lot which method is used if a lot of data is moved.  All
methods bust the L1 cache and the speed is reduced to at most that
of the L2 cache.  Fancy versions of memcpy would might fiddle with
the cache lines, but this is too hard for a generic function with
no standard for L2 caches.

memmove() is poor.  It only copies a byte at a time.

memcpy() optimizes constant counts.  This is the only special feature
in the Linux string libraries.  It depends on memcpy() being a macro.
The Linux implementation works best for counts <= 4.  Then fixed
registers are not required.  There should be special cases for
somewhat higher counts.  E.g., for a count of 8, it's surely smaller
and faster to do 2 loads and stores than to load 3 fixed registers
to do a "rep movsl", especially if the 3 registers have to be
saved and restored.  gcc-2.6.3 generates load/store instructions for
copying structs up to a size of 16; after that it generates
"rep movsl".

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503230940.TAA09637>