Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Nov 2005 13:48:40 +0000 (UTC)
From:      Bruce Evans <bde@FreeBSD.org>
To:        src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   cvs commit: src/lib/msun/src k_tanf.c
Message-ID:  <200511241348.jAODmeUT084533@repoman.freebsd.org>

next in thread | raw e-mail | index | archive | help
bde         2005-11-24 13:48:40 UTC

  FreeBSD src repository

  Modified files:
    lib/msun/src         k_tanf.c 
  Log:
  Minor cleanups and optimizations:
  
  - Remove dead code that I forgot to remove in the previous commit.
  
  - Calculate the sum of the lower terms of the polynomial (divided by
    x**5) in a single expression (sum of odd terms) + (sum of even terms)
    with parentheses to control grouping.  This is clearer and happens to
    give better instruction scheduling for a tiny optimization (an
    average of about ~0.5 cycles/call on Athlons).
  
  - Calculate the final sum in a single expression with parentheses to
    control grouping too.  Change the grouping from
    first_term + (second_term + sum_of_lower_terms) to
    (first_term + second_term) + sum_of_lower_terms.  Normally the first
    grouping must be used for accuracy, but extra precision makes any
    grouping give a correct result so we can group for efficiency.  This
    is a larger optimization (average 3-4 cycles/call or 5%).
  
  - Use parentheses to indicate that the C order of left to right evaluation
    is what is wanted (for efficiency) in a multiplication too.
  
  The old fdlibm code has several optimizations related to these.  2
  involve doing an extra operation that can be done almost in parallel
  on some superscalar machines but are pessimizations on sequential
  machines.  Others involve statement ordering or expression grouping.
  All of these except the ordering for the combining the sums of the odd
  and even terms seem to be ideal for Athlons, but parallelism is still
  limited so all of these optimizations combined together with the ones
  in this commit save only ~6-8 cycles (~10%).
  
  On an AXP, tanf() on uniformly distributed args in [-2pi, 2pi] now
  takes 39-59 cycles.  I don't know of any more optimizations for tanf()
  short of writing it all in asm with very MD instruction scheduling.
  Hardware fsin takes 122-138 cycles.  Most of the optimizations for
  tanf() don't work very well for tan[l]().  fdlibm tan() now takes
  145-365 cycles.
  
  Revision  Changes    Path
  1.18      +5 -11     src/lib/msun/src/k_tanf.c



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511241348.jAODmeUT084533>