Date: Wed, 28 Dec 2011 10:18:17 +0100 From: Ed Schouten <ed@80386.nl> To: Marius Strobl <marius@alchemy.franken.de> Cc: mips@freebsd.org, sparc64@freebsd.org Subject: [Updated patch] (Finally) migrate MIPS and SPARC64 to libcompiler_rt Message-ID: <20111228091817.GC1895@hoeg.nl> In-Reply-To: <20111228000723.GA77332@alchemy.franken.de> References: <20111227231243.GB1895@hoeg.nl> <20111228000723.GA77332@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--0QFb0wBpEddLcDHQ Content-Type: multipart/mixed; boundary="FFoLq8A0u+X9iRU8" Content-Disposition: inline --FFoLq8A0u+X9iRU8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Marius, * Marius Strobl <marius@alchemy.franken.de>, 20111228 01:07: > Before making libcompiler_rt the default for sparc64 could you please > also look into adding the optimized versions of _divsi3 and _modsi3 > (see contrib/gcc/config/sparc/lb1spc.asm) to libcompiler_rt? They're > taken from/based on the SPARC V8 Architecture Manual and IIRC I once > compared them and there actually was little difference so there should > be no licensing issues. Just to make sure we don't get into license problems, I copied the code =66rom the architecture manual and regenerated the assembly files. I compared them against the ones used by GCC and they should work. Please forget the previous patch I sent and use the one attached. If the attachment is missing, you can download the patch here: http://80386.nl/pub/compiler-rt.txt The code isn't that beautiful yet, but I'll clean it up before I send it to the compiler-rt folks. Thanks, --=20 Ed Schouten <ed@80386.nl> WWW: http://80386.nl/ --FFoLq8A0u+X9iRU8 Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename="compiler-rt.diff" Content-Transfer-Encoding: quoted-printable Index: gnu/lib/libgcc/Makefile =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gnu/lib/libgcc/Makefile (revision 228913) +++ gnu/lib/libgcc/Makefile (working copy) @@ -15,10 +15,6 @@ =20 .include "${.CURDIR}/../../usr.bin/cc/Makefile.tgt" =20 -.if ${TARGET_CPUARCH} =3D=3D "sparc64" || ${TARGET_CPUARCH} =3D=3D "mips" -LIB=3D gcc -.endif - .PATH: ${GCCDIR}/config/${GCC_CPU} ${GCCDIR}/config ${GCCDIR} =20 CFLAGS+=3D -DIN_GCC -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED \ Index: lib/libcompiler_rt/Makefile =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- lib/libcompiler_rt/Makefile (revision 228919) +++ lib/libcompiler_rt/Makefile (working copy) @@ -172,13 +172,11 @@ . endif .endfor =20 -.if ${MACHINE_CPUARCH} !=3D "sparc64" && ${MACHINE_CPUARCH} !=3D "mips" -. if ${MK_INSTALLLIB} !=3D "no" +.if ${MK_INSTALLLIB} !=3D "no" SYMLINKS+=3Dlibcompiler_rt.a ${LIBDIR}/libgcc.a -. endif -. if ${MK_PROFILE} !=3D "no" +.endif +.if ${MK_PROFILE} !=3D "no" SYMLINKS+=3Dlibcompiler_rt_p.a ${LIBDIR}/libgcc_p.a -. endif .endif =20 .if ${MACHINE_CPUARCH} =3D=3D "amd64" || ${MACHINE_CPUARCH} =3D=3D "i386" = || \ Index: contrib/compiler-rt/lib/ctzdi2.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/ctzdi2.c (revision 228913) +++ contrib/compiler-rt/lib/ctzdi2.c (working copy) @@ -15,6 +15,12 @@ =20 #include "int_lib.h" =20 +/* Workaround for LLVM bug 11663. */ +#if defined(__sparc64__) || defined(__mips_n64) +si_int __ctzsi2(si_int); +#define __builtin_ctz __ctzsi2 +#endif + /* Returns: the number of trailing 0-bits */ =20 /* Precondition: a !=3D 0 */ Index: contrib/compiler-rt/lib/sparc64/modsi3.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/modsi3.S (revision 0) +++ contrib/compiler-rt/lib/sparc64/modsi3.S (working copy) @@ -0,0 +1,333 @@ +/* + * This m4 code has been taken from The SPARC Architecture Manual Version = 8. + */ +/* + * Division/Remainder + * + * Input is: + * dividend -- the thing being divided + * divisor -- how many ways to divide it + * Important parameters: + * N -- how many bits per iteration we try to get + * as our current guess: + * WORDSIZE -- how many bits altogether we're talking about: + * obviously: + * A derived constant: + * TOPBITS -- how many bits are in the top "decade" of a number: + * + * Important variables are: + * Q -- the partial quotient under development -- initially 0 + * R -- the remainder so far -- initially =3D=3D the dividend + * ITER -- number of iterations of the main division loop which will + * be required. Equal to CEIL( lg2(quotient)/4 ) + * Note that this is log_base_(2=CB=864) of the quotient. + * V -- the current comparand -- initially divisor*2=CB=86(ITER*4-1) + * Cost: + * current estimate for non-large dividend is + * CEIL( lg2(quotient) / 4 ) x ( 10 + 74/2 ) + C + * a large dividend is one greater than 2=CB=86(31-4 ) and takes a + * different path, as the upper bits of the quotient must be developed + * one bit at a time. + * This uses the m4 and cpp macro preprocessors. + */ +/* + * This is the recursive definition of how we develop quotient digits. + * It takes three important parameters: + * $1 -- the current depth, 1<=3D$1<=3D4 + * $2 -- the current accumulation of quotient bits + * 4 -- max depth + * We add a new bit to $2 and either recurse or insert the bits in the quo= tient. + * Dynamic input: + * %o3 -- current remainder + * %o2 -- current quotient + * %o5 -- current comparand + * cc -- set on current value of %o3 + * Dynamic output: + * %o3', %o2', %o5', cc' + */ +#include "../assembly.h" +.text + .align 4 +DEFINE_COMPILERRT_FUNCTION(__umodsi3) + save %sp,-64,%sp ! do this for debugging + b divide + mov 0,%g3 ! result always nonnegative +DEFINE_COMPILERRT_FUNCTION(__modsi3) + save %sp,-64,%sp ! do this for debugging + orcc %o1,%o0,%g0 ! are either %o0 or %o1 negative + bge divide ! if not, skip this junk + mov %o0,%g3 ! record sign of result in sign of %g3 + tst %o1 + bge 2f + tst %o0 + ! %o1 < 0 + bge divide + neg %o1 + 2: + ! %o0 < 0 + neg %o0 + ! FALL THROUGH +divide: + ! Compute size of quotient, scale comparand. + orcc %o1,%g0,%o5 ! movcc %o1,%o5 + te 2 ! if %o1 =3D 0 + mov %o0,%o3 + mov 0,%o2 + sethi %hi(1<<(32-4 -1)),%g1 + cmp %o3,%g1 + blu not_really_big + mov 0,%o4 + ! + ! Here, the %o0 is >=3D 2=CB=86(31-4) or so. We must be careful here, + ! as our usual 4-at-a-shot divide step will cause overflow and havoc. + ! The total number of bits in the result here is 4*%o4+%g2, where + ! %g2 <=3D 4. + ! Compute %o4 in an unorthodox manner: know we need to Shift %o5 into +! the top decade: so don't even bother to compare to %o3. +1: + cmp %o5,%g1 + bgeu 3f + mov 1,%g2 + sll %o5,4,%o5 + b 1b + inc %o4 +! Now compute %g2 +2: addcc %o5,%o5,%o5 + bcc not_too_big + add %g2,1,%g2 + ! We're here if the %o1 overflowed when Shifting. + ! This means that %o3 has the high-order bit set. + ! Restore %o5 and subtract from %o3. + sll %g1,4 ,%g1 ! high order bit + srl %o5,1,%o5 ! rest of %o5 + add %o5,%g1,%o5 + b do_single_div + dec %g2 +not_too_big: +3: cmp %o5,%o3 + blu 2b + nop + be do_single_div + nop +! %o5 > %o3: went too far: back up 1 step +! srl %o5,1,%o5 +! dec %g2 +! do single-bit divide steps +! +! We have to be careful here. We know that %o3 >=3D %o5, so we can do the +! first divide step without thinking. BUT, the others are conditional, +! and are only done if %o3 >=3D 0. Because both %o3 and %o5 may have the h= igh- +! order bit set in the first step, just falling into the regular +! division loop will mess up the first time around. +! So we unroll slightly... +do_single_div: + deccc %g2 + bl end_regular_divide + nop + sub %o3,%o5,%o3 + mov 1,%o2 + b end_single_divloop + nop +single_divloop: + sll %o2,1,%o2 + bl 1f + srl %o5,1,%o5 + ! %o3 >=3D 0 + sub %o3,%o5,%o3 + b 2f + inc %o2 + 1: ! %o3 < 0 + add %o3,%o5,%o3 + dec %o2 + 2: + end_single_divloop: + deccc %g2 + bge single_divloop + tst %o3 + b end_regular_divide + nop +not_really_big: +1: + sll %o5,4,%o5 + cmp %o5,%o3 + bleu 1b + inccc %o4 + be got_result + dec %o4 +do_regular_divide: + ! Do the main division iteration + tst %o3 + ! Fall through into divide loop +divloop: + sll %o2,4,%o2 + !depth 1, accumulated bits 0 + bl L.1.16 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 2, accumulated bits 1 + bl L.2.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 3, accumulated bits 3 + bl L.3.19 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits 7 + bl L.4.23 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (7*2+1), %o2 +L.4.23: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (7*2-1), %o2 +L.3.19: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits 5 + bl L.4.21 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (5*2+1), %o2 +L.4.21: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (5*2-1), %o2 +L.2.17: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 3, accumulated bits 1 + bl L.3.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits 3 + bl L.4.19 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (3*2+1), %o2 +L.4.19: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (3*2-1), %o2 +L.3.17: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits 1 + bl L.4.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (1*2+1), %o2 +L.4.17: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (1*2-1), %o2 +L.1.16: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 2, accumulated bits -1 + bl L.2.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 3, accumulated bits -1 + bl L.3.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits -1 + bl L.4.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-1*2+1), %o2 +L.4.15: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-1*2-1), %o2 +L.3.15: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits -3 + bl L.4.13 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-3*2+1), %o2 +L.4.13: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-3*2-1), %o2 +L.2.15: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 3, accumulated bits -3 + bl L.3.13 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits -5 + bl L.4.11 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-5*2+1), %o2 +L.4.11: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-5*2-1), %o2 +L.3.13: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits -7 + bl L.4.9 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-7*2+1), %o2 +L.4.9: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-7*2-1), %o2 + 9: +end_regular_divide: + deccc %o4 + bge divloop + tst %o3 + bge got_result + nop + ! non-restoring fixup here + add %o3,%o1,%o3 +got_result: + tst %g3 + bge 1f + restore + ! answer < 0 + retl ! leaf-routine return + neg %o3,%o0 ! remainder <- -%o3 +1: + retl ! leaf-routine return + mov %o3,%o0 ! remainder <- %o3 Index: contrib/compiler-rt/lib/sparc64/divmod.m4 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/divmod.m4 (revision 0) +++ contrib/compiler-rt/lib/sparc64/divmod.m4 (working copy) @@ -0,0 +1,250 @@ +/* + * This m4 code has been taken from The SPARC Architecture Manual Version = 8. + */ + +/* + * Division/Remainder + * + * Input is: + * dividend -- the thing being divided + * divisor -- how many ways to divide it + * Important parameters: + * N -- how many bits per iteration we try to get + * as our current guess: define(N, 4) define(TWOSUPN, 16) + * WORDSIZE -- how many bits altogether we're talking about: + * obviously: define(WORDSIZE, 32) + * A derived constant: + * TOPBITS -- how many bits are in the top "decade" of a number: + * define(TOPBITS, eval( WORDSIZE - N*((WORDSIZE-1)/N) ) ) + * Important variables are: + * Q -- the partial quotient under development -- initially 0 + * R -- the remainder so far -- initially =3D=3D the dividend + * ITER -- number of iterations of the main division loop which will + * be required. Equal to CEIL( lg2(quotient)/N ) + * Note that this is log_base_(2=CB=86N) of the quotient. + * V -- the current comparand -- initially divisor*2=CB=86(ITER*N-1) + * Cost: + * current estimate for non-large dividend is + * CEIL( lg2(quotient) / N ) x ( 10 + 7N/2 ) + C + * a large dividend is one greater than 2=CB=86(31-TOPBITS) and takes a + * different path, as the upper bits of the quotient must be developed + * one bit at a time. + * This uses the m4 and cpp macro preprocessors. + */ + +define(dividend, `%o0') +define(divisor,`%o1') +define(Q, `%o2') +define(R, `%o3') +define(ITER, `%o4') +define(V, `%o5') +define(SIGN, `%g3') +define(T, `%g1') +define(SC,`%g2') +/* + * This is the recursive definition of how we develop quotient digits. + * It takes three important parameters: + * $1 -- the current depth, 1<=3D$1<=3DN + * $2 -- the current accumulation of quotient bits + * N -- max depth + * We add a new bit to $2 and either recurse or insert the bits in the quo= tient. + * Dynamic input: + * R -- current remainder + * Q -- current quotient + * V -- current comparand + * cc -- set on current value of R + * Dynamic output: + * R', Q', V', cc' + */ + +#include "../assembly.h" + +.text + .align 4 + +define(DEVELOP_QUOTIENT_BITS, +` !depth $1, accumulated bits $2 + bl L.$1.eval(TWOSUPN+$2) + srl V,1,V + ! remainder is nonnegative + subcc R,V,R + ifelse( $1, N, + ` b 9f + add Q, ($2*2+1), Q + ',` DEVELOP_QUOTIENT_BITS( incr($1), `eval(2*$2+1)') + ') +L.$1.eval(TWOSUPN+$2): + ! remainder is negative + addcc R,V,R + ifelse( $1, N, + ` b 9f + add Q, ($2*2-1), Q + ',` DEVELOP_QUOTIENT_BITS( incr($1), `eval(2*$2-1)') + ') + ifelse( $1, 1, `9:') +') +ifelse( ANSWER, `quotient', ` +DEFINE_COMPILERRT_FUNCTION(__udivsi3) + save %sp,-64,%sp ! do this for debugging + b divide + mov 0,SIGN ! result always nonnegative +DEFINE_COMPILERRT_FUNCTION(__divsi3) + save %sp,-64,%sp ! do this for debugging + orcc divisor,dividend,%g0 ! are either dividend or divisor negative + bge divide ! if not, skip this junk + xor divisor,dividend,SIGN ! record sign of result in sign of SIGN + tst divisor + bge 2f + tst dividend + ! divisor < 0 + bge divide + neg divisor + 2: + ! dividend < 0 + neg dividend + ! FALL THROUGH +',` +DEFINE_COMPILERRT_FUNCTION(__umodsi3) + save %sp,-64,%sp ! do this for debugging + b divide + mov 0,SIGN ! result always nonnegative +DEFINE_COMPILERRT_FUNCTION(__modsi3) + save %sp,-64,%sp ! do this for debugging + orcc divisor,dividend,%g0 ! are either dividend or divisor negative + bge divide ! if not, skip this junk + mov dividend,SIGN ! record sign of result in sign of SIGN + tst divisor + bge 2f + tst dividend + ! divisor < 0 + bge divide + neg divisor + 2: + ! dividend < 0 + neg dividend + ! FALL THROUGH +') + +divide: + ! Compute size of quotient, scale comparand. + orcc divisor,%g0,V ! movcc divisor,V + te 2 ! if divisor =3D 0 + mov dividend,R + mov 0,Q + sethi %hi(1<<(WORDSIZE-TOPBITS-1)),T + cmp R,T + blu not_really_big + mov 0,ITER + ! + ! Here, the dividend is >=3D 2=CB=86(31-N) or so. We must be careful here, + ! as our usual N-at-a-shot divide step will cause overflow and havoc. + ! The total number of bits in the result here is N*ITER+SC, where + ! SC <=3D N. + ! Compute ITER in an unorthodox manner: know we need to Shift V into +! the top decade: so don't even bother to compare to R. +1: + cmp V,T + bgeu 3f + mov 1,SC + sll V,N,V + b 1b + inc ITER +! Now compute SC +2: addcc V,V,V + bcc not_too_big + add SC,1,SC + ! We're here if the divisor overflowed when Shifting. + ! This means that R has the high-order bit set. + ! Restore V and subtract from R. + sll T,TOPBITS,T ! high order bit + srl V,1,V ! rest of V + add V,T,V + b do_single_div + dec SC +not_too_big: +3: cmp V,R + blu 2b + nop + be do_single_div + nop +! V > R: went too far: back up 1 step +! srl V,1,V +! dec SC +! do single-bit divide steps +! +! We have to be careful here. We know that R >=3D V, so we can do the +! first divide step without thinking. BUT, the others are conditional, +! and are only done if R >=3D 0. Because both R and V may have the high- +! order bit set in the first step, just falling into the regular +! division loop will mess up the first time around. +! So we unroll slightly... +do_single_div: + deccc SC + bl end_regular_divide + nop + sub R,V,R + mov 1,Q + b end_single_divloop + nop +single_divloop: + sll Q,1,Q + bl 1f + srl V,1,V + ! R >=3D 0 + sub R,V,R + b 2f + inc Q + 1: ! R < 0 + add R,V,R + dec Q + 2: + end_single_divloop: + deccc SC + bge single_divloop + tst R + b end_regular_divide + nop + +not_really_big: +1: + sll V,N,V + cmp V,R + bleu 1b + inccc ITER + be got_result + dec ITER +do_regular_divide: + ! Do the main division iteration + tst R + ! Fall through into divide loop +divloop: + sll Q,N,Q + DEVELOP_QUOTIENT_BITS( 1, 0 ) +end_regular_divide: + deccc ITER + bge divloop + tst R + bge got_result + nop + ! non-restoring fixup here +ifelse( ANSWER, `quotient', +` dec Q +',` add R,divisor,R +') + +got_result: + tst SIGN + bge 1f + restore + ! answer < 0 + retl ! leaf-routine return +ifelse( ANSWER, `quotient', +` neg %o2,%o0 ! quotient <- -Q +',` neg %o3,%o0 ! remainder <- -R +') +1: + retl ! leaf-routine return +ifelse( ANSWER, `quotient', +` mov %o2,%o0 ! quotient <- Q +',` mov %o3,%o0 ! remainder <- R +') Index: contrib/compiler-rt/lib/sparc64/divsi3.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/divsi3.S (revision 0) +++ contrib/compiler-rt/lib/sparc64/divsi3.S (working copy) @@ -0,0 +1,333 @@ +/* + * This m4 code has been taken from The SPARC Architecture Manual Version = 8. + */ +/* + * Division/Remainder + * + * Input is: + * dividend -- the thing being divided + * divisor -- how many ways to divide it + * Important parameters: + * N -- how many bits per iteration we try to get + * as our current guess: + * WORDSIZE -- how many bits altogether we're talking about: + * obviously: + * A derived constant: + * TOPBITS -- how many bits are in the top "decade" of a number: + * + * Important variables are: + * Q -- the partial quotient under development -- initially 0 + * R -- the remainder so far -- initially =3D=3D the dividend + * ITER -- number of iterations of the main division loop which will + * be required. Equal to CEIL( lg2(quotient)/4 ) + * Note that this is log_base_(2=CB=864) of the quotient. + * V -- the current comparand -- initially divisor*2=CB=86(ITER*4-1) + * Cost: + * current estimate for non-large dividend is + * CEIL( lg2(quotient) / 4 ) x ( 10 + 74/2 ) + C + * a large dividend is one greater than 2=CB=86(31-4 ) and takes a + * different path, as the upper bits of the quotient must be developed + * one bit at a time. + * This uses the m4 and cpp macro preprocessors. + */ +/* + * This is the recursive definition of how we develop quotient digits. + * It takes three important parameters: + * $1 -- the current depth, 1<=3D$1<=3D4 + * $2 -- the current accumulation of quotient bits + * 4 -- max depth + * We add a new bit to $2 and either recurse or insert the bits in the quo= tient. + * Dynamic input: + * %o3 -- current remainder + * %o2 -- current quotient + * %o5 -- current comparand + * cc -- set on current value of %o3 + * Dynamic output: + * %o3', %o2', %o5', cc' + */ +#include "../assembly.h" +.text + .align 4 +DEFINE_COMPILERRT_FUNCTION(__udivsi3) + save %sp,-64,%sp ! do this for debugging + b divide + mov 0,%g3 ! result always nonnegative +DEFINE_COMPILERRT_FUNCTION(__divsi3) + save %sp,-64,%sp ! do this for debugging + orcc %o1,%o0,%g0 ! are either %o0 or %o1 negative + bge divide ! if not, skip this junk + xor %o1,%o0,%g3 ! record sign of result in sign of %g3 + tst %o1 + bge 2f + tst %o0 + ! %o1 < 0 + bge divide + neg %o1 + 2: + ! %o0 < 0 + neg %o0 + ! FALL THROUGH +divide: + ! Compute size of quotient, scale comparand. + orcc %o1,%g0,%o5 ! movcc %o1,%o5 + te 2 ! if %o1 =3D 0 + mov %o0,%o3 + mov 0,%o2 + sethi %hi(1<<(32-4 -1)),%g1 + cmp %o3,%g1 + blu not_really_big + mov 0,%o4 + ! + ! Here, the %o0 is >=3D 2=CB=86(31-4) or so. We must be careful here, + ! as our usual 4-at-a-shot divide step will cause overflow and havoc. + ! The total number of bits in the result here is 4*%o4+%g2, where + ! %g2 <=3D 4. + ! Compute %o4 in an unorthodox manner: know we need to Shift %o5 into +! the top decade: so don't even bother to compare to %o3. +1: + cmp %o5,%g1 + bgeu 3f + mov 1,%g2 + sll %o5,4,%o5 + b 1b + inc %o4 +! Now compute %g2 +2: addcc %o5,%o5,%o5 + bcc not_too_big + add %g2,1,%g2 + ! We're here if the %o1 overflowed when Shifting. + ! This means that %o3 has the high-order bit set. + ! Restore %o5 and subtract from %o3. + sll %g1,4 ,%g1 ! high order bit + srl %o5,1,%o5 ! rest of %o5 + add %o5,%g1,%o5 + b do_single_div + dec %g2 +not_too_big: +3: cmp %o5,%o3 + blu 2b + nop + be do_single_div + nop +! %o5 > %o3: went too far: back up 1 step +! srl %o5,1,%o5 +! dec %g2 +! do single-bit divide steps +! +! We have to be careful here. We know that %o3 >=3D %o5, so we can do the +! first divide step without thinking. BUT, the others are conditional, +! and are only done if %o3 >=3D 0. Because both %o3 and %o5 may have the h= igh- +! order bit set in the first step, just falling into the regular +! division loop will mess up the first time around. +! So we unroll slightly... +do_single_div: + deccc %g2 + bl end_regular_divide + nop + sub %o3,%o5,%o3 + mov 1,%o2 + b end_single_divloop + nop +single_divloop: + sll %o2,1,%o2 + bl 1f + srl %o5,1,%o5 + ! %o3 >=3D 0 + sub %o3,%o5,%o3 + b 2f + inc %o2 + 1: ! %o3 < 0 + add %o3,%o5,%o3 + dec %o2 + 2: + end_single_divloop: + deccc %g2 + bge single_divloop + tst %o3 + b end_regular_divide + nop +not_really_big: +1: + sll %o5,4,%o5 + cmp %o5,%o3 + bleu 1b + inccc %o4 + be got_result + dec %o4 +do_regular_divide: + ! Do the main division iteration + tst %o3 + ! Fall through into divide loop +divloop: + sll %o2,4,%o2 + !depth 1, accumulated bits 0 + bl L.1.16 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 2, accumulated bits 1 + bl L.2.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 3, accumulated bits 3 + bl L.3.19 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits 7 + bl L.4.23 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (7*2+1), %o2 +L.4.23: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (7*2-1), %o2 +L.3.19: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits 5 + bl L.4.21 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (5*2+1), %o2 +L.4.21: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (5*2-1), %o2 +L.2.17: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 3, accumulated bits 1 + bl L.3.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits 3 + bl L.4.19 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (3*2+1), %o2 +L.4.19: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (3*2-1), %o2 +L.3.17: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits 1 + bl L.4.17 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (1*2+1), %o2 +L.4.17: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (1*2-1), %o2 +L.1.16: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 2, accumulated bits -1 + bl L.2.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 3, accumulated bits -1 + bl L.3.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits -1 + bl L.4.15 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-1*2+1), %o2 +L.4.15: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-1*2-1), %o2 +L.3.15: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits -3 + bl L.4.13 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-3*2+1), %o2 +L.4.13: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-3*2-1), %o2 +L.2.15: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 3, accumulated bits -3 + bl L.3.13 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + !depth 4, accumulated bits -5 + bl L.4.11 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-5*2+1), %o2 +L.4.11: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-5*2-1), %o2 +L.3.13: + ! remainder is negative + addcc %o3,%o5,%o3 + !depth 4, accumulated bits -7 + bl L.4.9 + srl %o5,1,%o5 + ! remainder is nonnegative + subcc %o3,%o5,%o3 + b 9f + add %o2, (-7*2+1), %o2 +L.4.9: + ! remainder is negative + addcc %o3,%o5,%o3 + b 9f + add %o2, (-7*2-1), %o2 + 9: +end_regular_divide: + deccc %o4 + bge divloop + tst %o3 + bge got_result + nop + ! non-restoring fixup here + dec %o2 +got_result: + tst %g3 + bge 1f + restore + ! answer < 0 + retl ! leaf-routine return + neg %o2,%o0 ! quotient <- -%o2 +1: + retl ! leaf-routine return + mov %o2,%o0 ! quotient <- %o2 Index: contrib/compiler-rt/lib/sparc64/generate.sh =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/generate.sh (revision 0) +++ contrib/compiler-rt/lib/sparc64/generate.sh (working copy) @@ -0,0 +1,6 @@ +#!/bin/sh + +m4 divmod.m4 | sed -e 's/[[:space:]]*$//' | grep -v '^$' > modsi3.S +m4 -DANSWER=3Dquotient divmod.m4 | sed -e 's/[[:space:]]*$//' | grep -v '^= $' > divsi3.S +echo '! This file intentionally left blank' > umodsi3.S +echo '! This file intentionally left blank' > udivsi3.S Index: contrib/compiler-rt/lib/sparc64/umodsi3.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/umodsi3.S (revision 0) +++ contrib/compiler-rt/lib/sparc64/umodsi3.S (working copy) @@ -0,0 +1 @@ +! This file intentionally left blank Index: contrib/compiler-rt/lib/sparc64/udivsi3.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- contrib/compiler-rt/lib/sparc64/udivsi3.S (revision 0) +++ contrib/compiler-rt/lib/sparc64/udivsi3.S (working copy) @@ -0,0 +1 @@ +! This file intentionally left blank --FFoLq8A0u+X9iRU8-- --0QFb0wBpEddLcDHQ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJO+t7ZAAoJEG5e2P40kaK73twP/jQL0q78+M0OpI7gPqhLet+d bxEdUsMn4Mnf9qw4HXTcJiqyN7jkofbnGWc1pWuJ02AQBBmwuW5f6vNpM7aj6MLG I4txZ03K6AnWfbK45BG2psd0+DFGUXLI1m/yQRDr8/2p5QtdzlGYlyL1ggBQ3puk XIk98BCrhybUDU2W8WzEKCOUUKVFFKv+U6Zq4EQhXf5XvtDwcgxoiIWFegphsdSf xCXsW+BLCQ1XN59ahsYMgtA9HNFxT3q26veM29i4nNkmGXZkOQM+SakdbHc3BqTU W4GfHB9CWtBJqmhO+lsGp/sa8TIlP5vJ/0Lc6A6IaUXBESqqki/Jp+rzJmRrnvBP EtQ+lfjUulk0SP1CUiJFXnko2BMjYo8OQj/B5X1k58qgF965cOB+rZdZBSSQWyN3 TzbRFxnovXqk9BfacXJ2ZeAjvi4+/tsQ2qrQfObg7jj2Y+hGITQFsack1iIpcIak lwk15e/Aa2xVdhxOCTWPYVvudtPS3KIP4BGAJm57g7jHj3YR0a4dDsyQpbVAqWav QoYDqtI15PDbVFoES5SFwczBtCif8IAaLWE8bM+xKEcvq9OhhozbZYlGqdLsal86 MfC/jua+48m0/QoeXNf8HU9hjXT9ha5FhXDWlHhAzXSANpheN42CygCHMdBxzl0z h48GZGtQSqUTMx6T1W7m =1nwD -----END PGP SIGNATURE----- --0QFb0wBpEddLcDHQ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111228091817.GC1895>