Date: Sun, 26 May 2013 08:23:28 +0200 From: =?ISO-8859-1?Q?V=E1clav_Zeman?= <vhaisman@gmail.com> To: freebsd-hackers@freebsd.org Subject: Re: Performance improvement to strnlen(). Message-ID: <51A1AA60.9050209@gmail.com> In-Reply-To: <afa77dcc2e1cb351cfaded708acbdae0@lthomas.net> References: <afa77dcc2e1cb351cfaded708acbdae0@lthomas.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig1FEBB2302363A6AC4C0260A6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 05/25/2013 10:27 PM, Lee Thomas wrote: > Hello FreeBSD devs, >=20 > I have found a performance improvement to libc's strnlen(). > lib/libc/string/strnlen.c is a trivial byte-by-byte implementation, > where strlen.c has a smarter word-by-word implementation. I have > implemented strnlen similarly to strlen. It runs about 4x as fast, at > the cost of a binary codesize increase from 30 bytes to 221. >=20 > In writing this I needed a test, and there isn't one in > tools/regression/lib/libc/string, so I wrote a unit test for strnlen. > This really only makes sense for a word-by-word implementation, as it > tests each combination of string and limit length, overread characters,= > and starting alignment. >=20 > Could someone please review these patches? I am not very experienced in= > C, and am even less experienced with FreeBSD's style guidelines, so the= y > likely have a bunch of style and idiom issues. Even if one or both of > them prove not worth committing, it would still be useful for my learni= ng. >=20 > If/When these patches prove worthy of submitting, what is the next step= > after that? Should I submit a PR, or is there some other process? This > code is quite similar to the existing strlen.c, and doesn't do anything= > fancy with e.g. endianness, but I haven't tested it for correctness or > performance on anything other than x86... >=20 > And finally, there is some other low-hanging fruit in the other strn* > functions. Would it be worth it for me to give those the same treatment= ? >=20 > Thanks, > Lee Thomas >=20 > Test platform: > $uname -a > FreeBSD LeeDesktop 9.1-STABLE FreeBSD 9.1-STABLE #15 r250831: > Mon May 20 17:31:28 EDT 2013 > lthomas@LeeDesktop:/usr/obj/usr/src/sys/NOSOUND amd64 > $dmesg | grep CPU: > CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz > (2666.81-MHz K8-class CPU) > $clang --version > FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 201212= 21 > Target: x86_64-unknown-freebsd9.1 > Thread model: posix >=20 > My timing test was a file of 10000 strings, of uniformly random length,= > 50% between 0 and 20 bytes long, and 50% between 21 and 1000 bytes long= =2E > Each was filled with random generated bytes in the range [1, 255]. The > test executables run their strlen on each string in the file 1000 times= , > and a shell script ran the test programs alternately 100 times. Here ar= e > the clang results; gcc results are roughly the same. I will share the > actual timing code if someone wants it, but it is pretty ugly C++ and > shell and I don't guarantee it working :-). >=20 > Results: >=20 > x ./times_bsd_strnlen.txt > + ./times_my_strnlen.txt > +----------------------------------------------------------------------= ----+ >=20 > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |+ = x| > |A = A| > +----------------------------------------------------------------------= ----+ >=20 > N Min Max Median Avg St= ddev > x 101 1.8685486 1.8693889 1.8687506 1.8687894 0.000148= 4903 > + 101 0.42704683 0.42779486 0.42712813 0.42715597 0.0001068= 1494 > Difference at 95.0% confidence > -1.44163 +/- 3.56739e-05 > -77.1426% +/- 0.00190893% > (Student's t, pooled s =3D 0.000129342) >=20 > Note: I manually shortened the histogram, as it was 100 lines long of > exactly the same. >=20 >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" >=20 > Index: strnlen.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > diff --git a/head/lib/libc/string/strnlen.c b/head/lib/libc/string/strn= len.c > --- a/head/lib/libc/string/strnlen.c (revision 250951) > +++ b/head/lib/libc/string/strnlen.c (working copy) > @@ -1,5 +1,6 @@ > /*- > - * Copyright (c) 2009 David Schultz <das@FreeBSD.org> > + * Copyright (c) 2009, 2010 Xin LI <delphij@FreeBSD.org> > + * Copyright (c) 2013 Lee Thomas <lee_thomas@AslanTools.com> > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -27,16 +28,91 @@ > #include <sys/cdefs.h> > __FBSDID("$FreeBSD$"); > =20 > +#include <sys/limits.h> > +#include <sys/types.h> > #include <string.h> > =20 > +/* > + * Portable strnlen() for 32-bit and 64-bit systems. > + * > + * Rationale: it is generally much more efficient to do word length > + * operations and avoid branches on modern computer systems, as > + * compared to byte-length operations with a lot of branches. > + * > + * The expression: > + * > + * ((x - 0x01....01) & ~x & 0x80....80) > + * > + * would evaluate to a non-zero value iff any of the bytes in the > + * original word is zero. > + * > + * On multi-issue processors, we can divide the above expression into:= > + * a) (x - 0x01....01) > + * b) (~x & 0x80....80) > + * c) a & b > + * > + * Where, a) and b) can be partially computed in parallel. > + * > + * The algorithm above is found on "Hacker's Delight" by > + * Henry S. Warren, Jr. > + */ > + > +/* Magic numbers for the algorithm */ > +#if LONG_BIT =3D=3D 32 > +static const unsigned long mask01 =3D 0x01010101; > +static const unsigned long mask80 =3D 0x80808080; > +#elif LONG_BIT =3D=3D 64 > +static const unsigned long mask01 =3D 0x0101010101010101; > +static const unsigned long mask80 =3D 0x8080808080808080; > +#else > +#error Unsupported word size > +#endif > + > +#define LONGPTR_MASK (sizeof(long) - 1) > + > size_t > -strnlen(const char *s, size_t maxlen) > +strnlen(const char *str, size_t maxlen) > { > - size_t len; > + const char *stop, *short_stop; > + const char *p; > + const unsigned long *lp; > + long va, vb; > =20 > - for (len =3D 0; len < maxlen; len++, s++) { > - if (!*s) > - break; > + if (maxlen=3D=3D0) return 0; > + > + stop=3Dstr+maxlen; > + /* > + * Before trying the hard (unaligned byte-by-byte access) way > + * to figure out whether there is a nul character, try to see > + * if there is a nul character is within this accessible word > + * first. > + * > + * p and (p & ~LONGPTR_MASK) must be equally accessible since > + * they always fall in the same memory page, as long as page > + * boundaries is integral multiple of word size. > + */ > + lp =3D (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK); > + va =3D (*lp - mask01); > + vb =3D ((~*lp) & mask80); I do not think that this correct C. This is type punning violating the rules of the language. One way you can rewrite this is to use memcpy() into a temporary. Do not be afraid of the memcpy(), modern GCC, and I hope even Clang, can optimize it into a single move instruction: { long tmp; memcpy(&tmp, lp, sizeof(tmp)); va =3D (tmp - mask01); vb =3D ((~tmp) & mask80); } Another way could be using the union trick. > + lp++; > + if (va & vb) { > + /* Check if we have \0 in the first part */ > + short_stop=3D(const char *)lp; > + if (stop<short_stop) short_stop=3Dstop; > + for (p=3Dstr; p !=3D short_stop; p++) > + if (*p =3D=3D '\0') > + return (p-str); > } > - return (len); > + /* Scan the rest of the string using word sized operation */ > + for (; (const char *)lp < stop; lp++) { > + va =3D (*lp - mask01); > + vb =3D ((~*lp) & mask80); Ditto. > + if (va & vb) { > + for (p=3D(const char *)lp; p !=3D stop; p++) > + if (*p =3D=3D '\0') > + break; > + return (p-str); > + } > + } > + return (maxlen); > } --=20 VZ --------------enig1FEBB2302363A6AC4C0260A6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iF4EAREIAAYFAlGhqmEACgkQ6OWUyaYNY6OgEAD/XZbTU+/A4DiqlCuF31LB2WFz 9Qmcy8o2eALuL9HCkkgBAIxcLCprHbfby4hx84Laz36uFBiF3EnibKW/DuVxzqar =v93F -----END PGP SIGNATURE----- --------------enig1FEBB2302363A6AC4C0260A6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51A1AA60.9050209>