From owner-cvs-src@FreeBSD.ORG Mon Nov 28 11:46:21 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1CE0316A41F; Mon, 28 Nov 2005 11:46:21 +0000 (GMT) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 73FEF43D6D; Mon, 28 Nov 2005 11:46:20 +0000 (GMT) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.1/8.13.1) with ESMTP id jASBkK24074004; Mon, 28 Nov 2005 11:46:20 GMT (envelope-from bde@repoman.freebsd.org) Received: (from bde@localhost) by repoman.freebsd.org (8.13.1/8.13.1/Submit) id jASBkKti074003; Mon, 28 Nov 2005 11:46:20 GMT (envelope-from bde) Message-Id: <200511281146.jASBkKti074003@repoman.freebsd.org> From: Bruce Evans Date: Mon, 28 Nov 2005 11:46:20 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Cc: Subject: cvs commit: src/lib/msun/src k_tanf.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2005 11:46:21 -0000 bde 2005-11-28 11:46:20 UTC FreeBSD src repository Modified files: lib/msun/src k_tanf.c Log: Rearranged the polynomial evaluation some more to reduce dependencies. Instead of echoing the code in a comment, try to describe why we split up the evaluation in a special way. The new optimization is mostly to move the evaluation of w = z*z later so that everything else (except z = x*x) doesn't have to wait for w. On Athlons, FP multiplication has a latency of 4 cycles so this optimization saves 4 cycles per call provided no new dependencies are introduced. Tweaking the other terms in to reduce dependencies saves a couple more cycles in some cases (more on AXP than on A64; up to 8 cycles out of 56 altogether in some cases). The previous version had a similar optimization for s = z*x. Special optimizations like these probably have a larger effect than the simple 2-way vectorization permitted (but not activated by gcc) in the old version, since 2-way vectorization is not enough and the polynomial's degree is so small in the float case that non-vectorizable dependencies dominate. On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now takes 34-55 cycles (was 39-59 cycles). Revision Changes Path 1.20 +20 -8 src/lib/msun/src/k_tanf.c