From owner-freebsd-amd64@FreeBSD.ORG Tue Dec 30 02:16:32 2003 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3073416A4CE for ; Tue, 30 Dec 2003 02:16:32 -0800 (PST) Received: from bigtex.jrv.org (rrcs-sw-24-73-246-106.biz.rr.com [24.73.246.106]) by mx1.FreeBSD.org (Postfix) with ESMTP id 218EA43D4C for ; Tue, 30 Dec 2003 02:16:31 -0800 (PST) (envelope-from james@bigtex.jrv.org) Received: from bigtex.jrv.org (localhost [127.0.0.1]) by bigtex.jrv.org (8.12.1/8.12.1) with ESMTP id hBUAGUo8085643 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 30 Dec 2003 04:16:30 -0600 (CST) Received: (from james@localhost) by bigtex.jrv.org (8.12.1/8.12.1/Submit) id hBUAGT4Q085640; Tue, 30 Dec 2003 04:16:29 -0600 (CST) Date: Tue, 30 Dec 2003 04:16:29 -0600 (CST) Message-Id: <200312301016.hBUAGT4Q085640@bigtex.jrv.org> From: James Van Artsdalen To: freebsd-amd64@freebsd.org Subject: Re: libc assembly optimizations? X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2003 10:16:32 -0000 Here's an alternative for fabs (3): ENTRY(fabs) psllq $1,%xmm0 /* 64-bit shifts lefts */ psrlq $1,%xmm0 /* logical shift right clears sign */ ret /usr/src/lib/libc/amd64/gen/fabs.S does the code below. gcc generates essentially the same code as below. The shifts above seem to work and look better to me. The string ops can made be significantly improved if allowed to read extra bytes around the string but within the same 16-byte paragraph as the start or end of the string. This seems safe in userland. Finally, can the SSE2 regs be safely used in kernel mode? Page fill and aligned-bulk bcopy calls can be improved this way. /* * Ok, this sucks. Is there really no way to push an xmm register onto * the FP stack directly? */ ENTRY(fabs) movsd %xmm0, -8(%rsp) fldl -8(%rsp) fabs fstpl -8(%rsp) movsd -8(%rsp),%xmm0 ret