From owner-dev-commits-src-all@freebsd.org  Fri Jan 29 23:48:31 2021
Return-Path: <owner-dev-commits-src-all@freebsd.org>
Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B24914FCAC1;
 Fri, 29 Jan 2021 23:48:31 +0000 (UTC) (envelope-from git@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4DSDYq4gq1z4bR0;
 Fri, 29 Jan 2021 23:48:31 +0000 (UTC) (envelope-from git@FreeBSD.org)
Received: from gitrepo.freebsd.org (gitrepo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:5])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 934FA33FC;
 Fri, 29 Jan 2021 23:48:31 +0000 (UTC) (envelope-from git@FreeBSD.org)
Received: from gitrepo.freebsd.org ([127.0.1.44])
 by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 10TNmVIE022428;
 Fri, 29 Jan 2021 23:48:31 GMT (envelope-from git@gitrepo.freebsd.org)
Received: (from git@localhost)
 by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 10TNmV8Z022427;
 Fri, 29 Jan 2021 23:48:31 GMT (envelope-from git)
Date: Fri, 29 Jan 2021 23:48:31 GMT
Message-Id: <202101292348.10TNmV8Z022427@gitrepo.freebsd.org>
To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org,
 dev-commits-src-main@FreeBSD.org
From: Mateusz Guzik <mjg@FreeBSD.org>
Subject: git: 710e45c4b853 - main - Reimplement strlen
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Git-Committer: mjg
X-Git-Repository: src
X-Git-Refname: refs/heads/main
X-Git-Reftype: branch
X-Git-Commit: 710e45c4b8539d028877769f1a4ec088c48fb5f1
Auto-Submitted: auto-generated
X-BeenThere: dev-commits-src-all@freebsd.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Commit messages for all branches of the src repository
 <dev-commits-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/dev-commits-src-all>, 
 <mailto:dev-commits-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/dev-commits-src-all/>
List-Post: <mailto:dev-commits-src-all@freebsd.org>
List-Help: <mailto:dev-commits-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/dev-commits-src-all>, 
 <mailto:dev-commits-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Jan 2021 23:48:31 -0000

The branch main has been updated by mjg:

URL: https://cgit.FreeBSD.org/src/commit/?id=710e45c4b8539d028877769f1a4ec088c48fb5f1

commit 710e45c4b8539d028877769f1a4ec088c48fb5f1
Author:     Mateusz Guzik <mjg@FreeBSD.org>
AuthorDate: 2021-01-29 21:48:11 +0000
Commit:     Mateusz Guzik <mjg@FreeBSD.org>
CommitDate: 2021-01-29 23:48:26 +0000

    Reimplement strlen
    
    The previous code neglected to use primitives which can find the end
    of the string without having to branch on every character.
    
    While here augment the somewhat misleading commentary -- strlen as
    implemented here leaves performance on the table, especially so for
    userspace. Every arch should get a dedicated variant instead.
    
    In the meantime this commit lessens the problem.
    
    Tested with glibc test suite.
    
    Naive test just calling strlen in a loop on Haswell (ops/s):
    
    $(perl -e "print 'A' x 3"):
    before: 211198039
    after:  338626619
    
    $(perl -e "print 'A' x 100"):
    before: 83151997
    after:  98285919
---
 lib/libc/string/strlen.c | 82 +++++++++++++++++-------------------------------
 sys/libkern/strlen.c     | 79 +++++++++++++++-------------------------------
 2 files changed, 53 insertions(+), 108 deletions(-)

diff --git a/lib/libc/string/strlen.c b/lib/libc/string/strlen.c
index a862ffc245ca..0bdc81d7bb9a 100644
--- a/lib/libc/string/strlen.c
+++ b/lib/libc/string/strlen.c
@@ -35,10 +35,6 @@ __FBSDID("$FreeBSD$");
 /*
  * Portable strlen() for 32-bit and 64-bit systems.
  *
- * Rationale: it is generally much more efficient to do word length
- * operations and avoid branches on modern computer systems, as
- * compared to byte-length operations with a lot of branches.
- *
  * The expression:
  *
  *	((x - 0x01....01) & ~x & 0x80....80)
@@ -46,18 +42,13 @@ __FBSDID("$FreeBSD$");
  * would evaluate to a non-zero value iff any of the bytes in the
  * original word is zero.
  *
- * On multi-issue processors, we can divide the above expression into:
- *	a)  (x - 0x01....01)
- *	b) (~x & 0x80....80)
- *	c) a & b
- *
- * Where, a) and b) can be partially computed in parallel.
- *
  * The algorithm above is found on "Hacker's Delight" by
  * Henry S. Warren, Jr.
+ *
+ * Note: this leaves performance on the table and each architecture
+ * would be best served with a tailor made routine instead.
  */
 
-/* Magic numbers for the algorithm */
 #if LONG_BIT == 32
 static const unsigned long mask01 = 0x01010101;
 static const unsigned long mask80 = 0x80808080;
@@ -70,62 +61,45 @@ static const unsigned long mask80 = 0x8080808080808080;
 
 #define	LONGPTR_MASK (sizeof(long) - 1)
 
-/*
- * Helper macro to return string length if we caught the zero
- * byte.
- */
-#define testbyte(x)				\
-	do {					\
-		if (p[x] == '\0')		\
-		    return (p - str + x);	\
-	} while (0)
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define	FINDZERO __builtin_ctzl
+#else
+#define	FINDZERO __builtin_clzl
+#endif
 
 size_t
 strlen(const char *str)
 {
-	const char *p;
 	const unsigned long *lp;
+	unsigned long mask;
 	long va, vb;
+	long val;
 
-	/*
-	 * Before trying the hard (unaligned byte-by-byte access) way
-	 * to figure out whether there is a nul character, try to see
-	 * if there is a nul character is within this accessible word
-	 * first.
-	 *
-	 * p and (p & ~LONGPTR_MASK) must be equally accessible since
-	 * they always fall in the same memory page, as long as page
-	 * boundaries is integral multiple of word size.
-	 */
-	lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
-	va = (*lp - mask01);
-	vb = ((~*lp) & mask80);
-	lp++;
-	if (va & vb)
-		/* Check if we have \0 in the first part */
-		for (p = str; p < (const char *)lp; p++)
-			if (*p == '\0')
-				return (p - str);
+	lp = (unsigned long *) (uintptr_t) str;
+	if ((uintptr_t)lp & LONGPTR_MASK) {
+		lp = (__typeof(lp)) ((uintptr_t)lp & ~LONGPTR_MASK);
+#if BYTE_ORDER == LITTLE_ENDIAN
+		mask = ~(~0UL << (((uintptr_t)str & LONGPTR_MASK) << 3));
+#else
+		mask = ~(~0UL >> (((uintptr_t)str & LONGPTR_MASK) << 3));
+#endif
+		val = *lp | mask;
+		va = (val - mask01);
+		vb = ((~val) & mask80);
+		if (va & vb) {
+			return ((const char *)lp - str + (FINDZERO(va & vb) >> 3));
+		}
+		lp++;
+	}
 
-	/* Scan the rest of the string using word sized operation */
 	for (; ; lp++) {
 		va = (*lp - mask01);
 		vb = ((~*lp) & mask80);
 		if (va & vb) {
-			p = (const char *)(lp);
-			testbyte(0);
-			testbyte(1);
-			testbyte(2);
-			testbyte(3);
-#if (LONG_BIT >= 64)
-			testbyte(4);
-			testbyte(5);
-			testbyte(6);
-			testbyte(7);
-#endif
+			return ((const char *)lp - str + (FINDZERO(va & vb) >> 3));
 		}
 	}
 
-	/* NOTREACHED */
+	__builtin_unreachable();
 	return (0);
 }
diff --git a/sys/libkern/strlen.c b/sys/libkern/strlen.c
index a8c7964f69a3..8fa5f3927ea9 100644
--- a/sys/libkern/strlen.c
+++ b/sys/libkern/strlen.c
@@ -34,10 +34,6 @@ __FBSDID("$FreeBSD$");
 /*
  * Portable strlen() for 32-bit and 64-bit systems.
  *
- * Rationale: it is generally much more efficient to do word length
- * operations and avoid branches on modern computer systems, as
- * compared to byte-length operations with a lot of branches.
- *
  * The expression:
  *
  *	((x - 0x01....01) & ~x & 0x80....80)
@@ -45,18 +41,10 @@ __FBSDID("$FreeBSD$");
  * would evaluate to a non-zero value iff any of the bytes in the
  * original word is zero.
  *
- * On multi-issue processors, we can divide the above expression into:
- *	a)  (x - 0x01....01)
- *	b) (~x & 0x80....80)
- *	c) a & b
- *
- * Where, a) and b) can be partially computed in parallel.
- *
  * The algorithm above is found on "Hacker's Delight" by
  * Henry S. Warren, Jr.
  */
 
-/* Magic numbers for the algorithm */
 #if LONG_BIT == 32
 static const unsigned long mask01 = 0x01010101;
 static const unsigned long mask80 = 0x80808080;
@@ -69,62 +57,45 @@ static const unsigned long mask80 = 0x8080808080808080;
 
 #define	LONGPTR_MASK (sizeof(long) - 1)
 
-/*
- * Helper macro to return string length if we caught the zero
- * byte.
- */
-#define testbyte(x)				\
-	do {					\
-		if (p[x] == '\0')		\
-		    return (p - str + x);	\
-	} while (0)
+#if BYTE_ORDER == LITTLE_ENDIAN
+#define	FINDZERO __builtin_ctzl
+#else
+#define	FINDZERO __builtin_clzl
+#endif
 
 size_t
 (strlen)(const char *str)
 {
-	const char *p;
 	const unsigned long *lp;
+	unsigned long mask;
 	long va, vb;
+	long val;
 
-	/*
-	 * Before trying the hard (unaligned byte-by-byte access) way
-	 * to figure out whether there is a nul character, try to see
-	 * if there is a nul character is within this accessible word
-	 * first.
-	 *
-	 * p and (p & ~LONGPTR_MASK) must be equally accessible since
-	 * they always fall in the same memory page, as long as page
-	 * boundaries is integral multiple of word size.
-	 */
-	lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
-	va = (*lp - mask01);
-	vb = ((~*lp) & mask80);
-	lp++;
-	if (va & vb)
-		/* Check if we have \0 in the first part */
-		for (p = str; p < (const char *)lp; p++)
-			if (*p == '\0')
-				return (p - str);
+	lp = (unsigned long *) (uintptr_t) str;
+	if ((uintptr_t)lp & LONGPTR_MASK) {
+		lp = (__typeof(lp)) ((uintptr_t)lp & ~LONGPTR_MASK);
+#if BYTE_ORDER == LITTLE_ENDIAN
+		mask = ~(~0UL << (((uintptr_t)str & LONGPTR_MASK) << 3));
+#else
+		mask = ~(~0UL >> (((uintptr_t)str & LONGPTR_MASK) << 3));
+#endif
+		val = *lp | mask;
+		va = (val - mask01);
+		vb = ((~val) & mask80);
+		if (va & vb) {
+			return ((const char *)lp - str + (FINDZERO(va & vb) >> 3));
+		}
+		lp++;
+	}
 
-	/* Scan the rest of the string using word sized operation */
 	for (; ; lp++) {
 		va = (*lp - mask01);
 		vb = ((~*lp) & mask80);
 		if (va & vb) {
-			p = (const char *)(lp);
-			testbyte(0);
-			testbyte(1);
-			testbyte(2);
-			testbyte(3);
-#if (LONG_BIT >= 64)
-			testbyte(4);
-			testbyte(5);
-			testbyte(6);
-			testbyte(7);
-#endif
+			return ((const char *)lp - str + (FINDZERO(va & vb) >> 3));
 		}
 	}
 
-	/* NOTREACHED */
+	__builtin_unreachable();
 	return (0);
 }