From owner-svn-src-stable@FreeBSD.ORG  Thu May 13 23:28:21 2010
Return-Path: <owner-svn-src-stable@FreeBSD.ORG>
Delivered-To: svn-src-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 41512106566B;
	Thu, 13 May 2010 23:28:21 +0000 (UTC)
	(envelope-from delphij@FreeBSD.org)
Received: from svn.freebsd.org (unknown [IPv6:2001:4f8:fff6::2c])
	by mx1.freebsd.org (Postfix) with ESMTP id 2F3CF8FC15;
	Thu, 13 May 2010 23:28:21 +0000 (UTC)
Received: from svn.freebsd.org (localhost [127.0.0.1])
	by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id o4DNSLfI080073;
	Thu, 13 May 2010 23:28:21 GMT (envelope-from delphij@svn.freebsd.org)
Received: (from delphij@localhost)
	by svn.freebsd.org (8.14.3/8.14.3/Submit) id o4DNSLYi080072;
	Thu, 13 May 2010 23:28:21 GMT (envelope-from delphij@svn.freebsd.org)
Message-Id: <201005132328.o4DNSLYi080072@svn.freebsd.org>
From: Xin LI <delphij@FreeBSD.org>
Date: Thu, 13 May 2010 23:28:20 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
	svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org
X-SVN-Group: stable-8
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: 
Subject: svn commit: r208051 - stable/8/lib/libc/string
X-BeenThere: svn-src-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SVN commit messages for all the -stable branches of the src tree
	<svn-src-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-stable>, 
	<mailto:svn-src-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable>
List-Post: <mailto:svn-src-stable@freebsd.org>
List-Help: <mailto:svn-src-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-stable>,
	<mailto:svn-src-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 May 2010 23:28:21 -0000

Author: delphij
Date: Thu May 13 23:28:20 2010
New Revision: 208051
URL: http://svn.freebsd.org/changeset/base/208051

Log:
  MFC r205099-205100,205108:
  
  Two optimizations to MI strlen(3) inspired by David S. Miller's
  blog posting [1].
  
   - Use word-sized test for unaligned pointer before working
     the hard way.
  
     Memory page boundary is always integral multiple of a word
     alignment boundary.  Therefore, if we can access memory
     referenced by pointer p, then (p & ~word mask) must be also
     accessible.
  
   - Better utilization of multi-issue processor's ability of
     concurrency.
  
     The previous implementation utilized a formular that must be
     executed sequentially.  However, the ~, & and - operations can
     actually be caculated at the same time when the operand were
     different and unrelated.
  
     The original Hacker's Delight formular also offered consistent
     performance regardless whether the input would contain
     characters with their highest-bit set, as it catches real
     nul characters only.
  
  These two optimizations has shown further improvements over the
  previous implementation on microbenchmarks on i386 and amd64 CPU
  including Pentium 4, Core Duo 2 and i7.
  
  [1] http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/03/08#strlen_1

Modified:
  stable/8/lib/libc/string/strlen.c
Directory Properties:
  stable/8/lib/libc/   (props changed)
  stable/8/lib/libc/stdtime/   (props changed)

Modified: stable/8/lib/libc/string/strlen.c
==============================================================================
--- stable/8/lib/libc/string/strlen.c	Thu May 13 20:55:58 2010	(r208050)
+++ stable/8/lib/libc/string/strlen.c	Thu May 13 23:28:20 2010	(r208051)
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2009 Xin LI <delphij@FreeBSD.org>
+ * Copyright (c) 2009, 2010 Xin LI <delphij@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -43,15 +43,17 @@ __FBSDID("$FreeBSD$");
  *	((x - 0x01....01) & ~x & 0x80....80)
  *
  * would evaluate to a non-zero value iff any of the bytes in the
- * original word is zero.  However, we can further reduce ~1/3 of
- * time if we consider that strlen() usually operate on 7-bit ASCII
- * by employing the following expression, which allows false positive
- * when high bit of 1 and use the tail case to catch these case:
+ * original word is zero.
  *
- *	((x - 0x01....01) & 0x80....80)
+ * On multi-issue processors, we can divide the above expression into:
+ *	a)  (x - 0x01....01)
+ *	b) (~x & 0x80....80)
+ *	c) a & b
  *
- * This is more than 5.2 times as fast as the raw implementation on
- * Intel T7300 under long mode for strings longer than word length.
+ * Where, a) and b) can be partially computed in parallel.
+ *
+ * The algorithm above is found on "Hacker's Delight" by
+ * Henry S. Warren, Jr.
  */
 
 /* Magic numbers for the algorithm */
@@ -82,29 +84,47 @@ strlen(const char *str)
 {
 	const char *p;
 	const unsigned long *lp;
+	long va, vb;
 
-	/* Skip the first few bytes until we have an aligned p */
-	for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
-	    if (*p == '\0')
-		return (p - str);
+	/*
+	 * Before trying the hard (unaligned byte-by-byte access) way
+	 * to figure out whether there is a nul character, try to see
+	 * if there is a nul character is within this accessible word
+	 * first.
+	 *
+	 * p and (p & ~LONGPTR_MASK) must be equally accessible since
+	 * they always fall in the same memory page, as long as page
+	 * boundaries is integral multiple of word size.
+	 */
+	lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
+	va = (*lp - mask01);
+	vb = ((~*lp) & mask80);
+	lp++;
+	if (va & vb)
+		/* Check if we have \0 in the first part */
+		for (p = str; p < (const char *)lp; p++)
+			if (*p == '\0')
+				return (p - str);
 
 	/* Scan the rest of the string using word sized operation */
-	for (lp = (const unsigned long *)p; ; lp++)
-	    if ((*lp - mask01) & mask80) {
-		p = (const char *)(lp);
-		testbyte(0);
-		testbyte(1);
-		testbyte(2);
-		testbyte(3);
+	for (; ; lp++) {
+		va = (*lp - mask01);
+		vb = ((~*lp) & mask80);
+		if (va & vb) {
+			p = (const char *)(lp);
+			testbyte(0);
+			testbyte(1);
+			testbyte(2);
+			testbyte(3);
 #if (LONG_BIT >= 64)
-		testbyte(4);
-		testbyte(5);
-		testbyte(6);
-		testbyte(7);
+			testbyte(4);
+			testbyte(5);
+			testbyte(6);
+			testbyte(7);
 #endif
-	    }
+		}
+	}
 
 	/* NOTREACHED */
 	return (0);
 }
-