From owner-freebsd-current Thu May 2 02:36:00 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id CAA11659 for current-outgoing; Thu, 2 May 1996 02:36:00 -0700 (PDT) Received: from silvia.HIP.Berkeley.EDU (silvia.HIP.Berkeley.EDU [136.152.64.181]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id CAA11644 for ; Thu, 2 May 1996 02:35:50 -0700 (PDT) Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.7.5/8.6.9) id CAA06965; Thu, 2 May 1996 02:35:47 -0700 (PDT) Date: Thu, 2 May 1996 02:35:47 -0700 (PDT) Message-Id: <199605020935.CAA06965@silvia.HIP.Berkeley.EDU> To: current@freebsd.org CC: nisha@cs.berkeley.edu Subject: more on fast bcopy From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Looking at Bruce's code, I added some prefetch to the code and got quite a bit of improvement. Extract the following in on your machine, type "make" and then "sh runtests" to see the numbers. Also, "sh runtests -graph" will create files suitable for gnuplot consumption. "load '/plot.gp'" will show you how they compare. The FP thing is still by far the fastest for large copies on all the Pentiums we've tried over here, but we can't use that in the kernel. We've got 67MB/s on the 133MHz Pentium + Triton here. Wow. Of the ones using integer registers, unroll size 96 and 128 seem to be the best (from 96 onwards, the height of the plateau is pretty much constant). However, for copies of size 64, unroll size 64 is better (of course, because the size 96 and 128 cases will degenerate into libc). With integer registers, the most we see are about 55MB/s. The program "unroll" is the meat of this package. You can do things like "unroll unrolled 96 32 2" to get a function "unrolled" that copies 96 bytes in one iteration, does a prefetch at intervals 32 and uses 2 scratch registers (this can only be 2 or 3). Look at the shell script "runtests" to see what other things you may want to try. Please send the output of "sh runtests", I would lie to hear especially from people with 486/P6 or Pentium with slow memory systems. Thanks! Satoshi ------- begin 644 bcopy.tar.gz M'XL(`$%_B#$"`^T\:W/;.)+Y2OZ*#F,GDB++)$7)KSBWGHPWE9K$=MG.3N9B MEX8B(8DV16H)RI8SR?WVZP;`EQY.YN8VV;JS:F(1W8U&H]'H!PA-WXLG=YNG MAP<_OSM\]"_Z6*;9=1QX!);=;;?Q&\#J.N*;/I;5V0+H6HYM;CFFU25(>\M\ M!.:C[_"9\M1-`!ZYW!T']]`-$]=_]'_O-H".!\%'&Z# M,`0O'D^"D$$Z8C!-@S!(`\;!C7QP0Q[#D$4L<5.FNS!)XC1.[R:!YX8P9N.> M-VGI^GER!P8?03*-4L93;D`:PTW`;N%VY*:"[8V;!/&4PS1*8APQ9-$P'7'P MW`C\&`9QHM_%4Q3J``=L`CX+5(J,J3<*.&;(VX=XFD(81"@>#D'H*0^B(;3U ME(TG<>(B)&'#@*)U/T"" M!&6%.*)I)#!VO1$*C&J2X8U>H. M4&%2WE&,UE%'#0B3"6/4W[,7!"3*EYLT>FLX>6;HJ#+27"91MN1!VM(?/7S^ MO3Y]X?_?X?XGX_@A_A_:EIGY_[:S92'6MFS[P?]_CX\;AKN9[Y5N6]=ED_FM M.$/IFB+)4&##3A<=(GP&%YU5#&M_TW79?S=S_QX4C'1MK?;J51TVCB4QK+W, MQLG&:'E+B;R0N=&NKB5CV!A4)2WS?]C)?V7_J[#S@_(_TVFK_=_M6IV.V/_M M[L/^_QZ?)X\W^T&TR4>Z/IA&WKZ1[2E#9]XHWC?HKZ'?CBB=^`AK3S`S2<&$ MRSU,SW3`9`RS(6.V9AF8F>BTG,9LXY_3@*5&79?K*_FDR909"K*WEU&.6#@Q M/M-#A1R,*7>';!?63/@HV5W"1Q+P,N/!9D%:9=?(.(B)K%EE+..NAU]\%`Q2 MW8\CINO!?MO6U^1H1,>#3XR^^Y@?W09^.BK-.H`-?$`[-HN9!P/"B/[[(-5$ M.,RCI!Z4C\*^:R00V"^?6O!RTV=*]F'0]W8-T2DKIX%@.0%& M-NE`L]IU__?L"3;X[]*U/X;Q-=;"L):CR+F6W+ODY;ET#.`EC(KPHG:^,/)^ M%X:<&\T,K,S3SLD&,L;!RV*\38*WL.I':BJC]XUGB[AG<"L//2:&CGQUJO>O MZ(@`,W-G&W"3899NV=M@.8Y8Q"<@S@2NB<:B!%XM;0Y4D" MB;N>Q^4'0H@,"F2Q.'/TN95F`;A*O6P5L]FL7:FUE)^%%:W2%>N*H9]HLY&% M\0"(G:"^Y*Y8K<_!)-+B/$-?MW MC_^9L?R0^.]L;>7G/UM.=TO6?YV'^/\]/IL-'1K0O^WE_F(#*Z'Q!#,`A%`L M(P,!Z0P;F_J3(/+"J<\T(PW&032T6YY1`%_P.[Z)"-8:O=3U)SX;X':%\]]. M#J77FO9#5@';"`[C:"C^9!CM[,U_'FKD=.)!C:CJ)5['YP=O"0^.M>.T34?; M;(#S[B>23;^)`U_+/$:MOB<`H+'Q)+VCICYV@ZCF>DUP;^JY+_9&9`"-AGNS MI_^10[%2DJ&-,X\WPSB>R"?`2HQ>%/"]G%3-K\$3Y-SP>;I'YV4\&$88"L7L MTO$$87+^VKB/SQSK82\%TM6-&T**8VLTY!%*J:&71BGA\3ZTZ_"'KFD#5;GQ MU&=)TD2(R+[>RTQJG1[:24V,?R+>N MXC%>P>/='(\:,4'[I"\2Z`A%S"U@?8#KH/_X5-BTRB(K\TF:6\I(7=RZLZ1\%KLR%FA(;GC2>D]/8EQCAE&T:= MK,V44[QGT&Q4C*G46C*,]H6VHC9D*=EV//#=N]K3]*8)1^_?OJV+:;J1'X]K MZ4TKO>G1IB(HI96U0.0D>YA?OH!L0;#U_#D:!:UPK9!);-H&J;7^,;C$?HHK M\>I_8DE<0]F:&1?:^^0H6,2G"8-^G(Z$N="K,)J#B\`(PSB_=2<3]3*19B&B M8TWLZSEV6NX1]DUE*X5F;=(L[49#F+52*P6:5`A8FJV<[-$>/'\>2+)BT.=B MN$VR]48@))B#B)9@B%M8$](@3Y[&$YJ]L$>1J_ZIX7/_^;\B`5I=,3+,?>Z5 M0WKM!2$69*B*D"]+10RQF%]665G%QH2_6FEGN$D6D30$(LES*3]M"">Q*]C# MNM_$1#B>ACZ])UXW[1D&$2X>R"4K9TY#-^>9YP,W[QD6IS[NXYSD7#:ETVPU M,I\Y%SO`6!\`QDLTLH"B!$8Y`:"31HP/XS[](RXUH<:-7*%U!(H34--LF?A\ M5"?N"4NG200U$QOH_&0Z\>'#!TPCCA@&-NYB!$[ERWZI8>SZ#`8NU1"D M\C?T1AU#5BC>ZM.[94Z,J#;)H;B<">/R?D)T#6Z?=BEQ1=";%+,+QJD8'\4X M`ITVTXMV%SBW6R)SH5Q`SXVJ6>SG^GP@KT3R^7Q`!9TB49#')3*`P,M]*JL; M0+9)=CQO\9)JHT2UN"O((9&+[2ZBR&S:6YOVATPV[MM$_^#<`R'LPFL81_J]@IM)Q$77&I>G3HX M\-9-L&9XY_TCOL-M"8KP`+>PO,`"PUA>T9'NJ%4I*L#HL\@;M49%20$OD#"( MJ9XH@=`/NF$5-HT"I)RCHVKD;H).8Q&<%RG!`$L+C4\CGT:?:#+OB'Y%=P'^`!;N`S7I.\NXG380!$04$#X7X12$(1A^T_C3P M8*X<$1&[1UD;Q=`>52?*M9\11ID*IDRW)=^JPCQYH!H!ZE!-`@NFM=)PGV*L MZAIUJ`20,QPU&X1R-.6_$HHHU(G.'\>!E\2HJSCRN91B*HI($?6%%/,U%EW! MHD1PA71BHJN%PQS[AD_[F,P2GR84/?*9E0.BH,K6MI$M)ZURAE"I[Q=9M^J2 MNV*>6OC/+"+3W%0:BJQ!=(W4+,1U9Y8P-Y9HV])1VFE1[32I=I MI0]E27/]7D!)WJ7";&PTYP>C8"8U(Q3Q\#[NQYS_*;?[8^Y_M.VN6<1_QZ;[ MOUL/YW_?/?XK(Q#AWU'AO[MI6V!;NZ:]:U?"?REHJ\^20)UAEL3FO-/2$)UC M%P-]AL+D(8K+$?SHN'=R?'K^[N#DY/`TCW?GKTYZ'PY>G6OMG:Y=@;XZ/CH_ M/7Y+B'8%\?/!^0%!G7GRHT/)IX,1F=YO+(Q1F_;$68EC.F9[APX<9"6$);5Z M7X(5%ZENB1CEOH[Y;7V%I)6.UC_[G^5,H3K5;^Q)XL[U;'^] MYS\.3\^TO)>59T/24/GS"Y":.N%[F=W;\ MZI?CD_,>_>9!,Q?`OYZ^.3_4K$7RGW\]U>P%\-'QT:'FZ!7X3^__KI72KWH% M>79^>(+(KD(IZX6SW\XZ.9T\P,)*/&1171NS,6CD25$3?$0(0@_; M=-Q`[9P,\U6?S8C.JU-2E'BC1+864DN10E4/KW/P:\'OVER4];.WKP^ M>'OZKBG>;-?WL$QWDW$-77D#NB)?$J?QXFA4G<07!T%R&)FI*9SK7Z$KSIN) MN"60-_-+63GDNI\_CDN/E!266F$8Y"VRB(*C3+?J>6+[6EVE]V&0Q&,8\V%K M1K6]-Z+#&N5]?'&Y?Q=SP4D28T08`^V&WLGI\6LZM"+'=,,2CC8G$62[XC2+ MRG/\.CUY)?9/C=J8O8,E4RKMBWP6W^V=+`SIH+.UD/^]ST^K13WG*ZUQ'L@L/76,(S[(?0R>T`4N0RM M@#3_1G=-*:[J.6P7'>*4CT)89_V)KHWC&WKDDZ9L9S@_*)YYH.BV:T14;Y9` MEIW#_!S6S6'>#!VP-YZ$L-9MR[9VU6?YS:%>Z@:AKNE/E"!>8A*G6098VVZB M`.Z,PNSB1%E4;I\)HZM- MRNSDS_0@3H)A@"D8W+CAE%6,T2(Q1TV''C-EC/+[T6$ M?OZY]%%G(9WI;92<`/F+>)*Y"/5(^D?*&R8. MV$K3P_P5-T^+[C^4?509OU%V9IF?>XC\Y?B?_0+KAYS_=+8Z3A;_,?QWQ?L? M\^'^]_>Y_[WXG@4+(@C15=*OH[F'=06ZX>*GT^AM5.;.AOPCW;R01\\&^0"# M7B^C_Y+?_9FA?P$L9>C=J;AZ10]N,O2:*L5O8..FK@ZO"8E,F^(R4U/<]*3. MV7$S]:/W[QUQ/V#AK?8TOPLE;FN3=-FMJ(C)JT\X%EU^0J;9C1V0Y=<777JG M87X#BDCMRPPO^)11[1PEI"RCG!PE-Y6$6I?9M,0,J5=&IEX8F"!.P%4GJ1Z: M25/H1?;--40/@D=)=4$3KLKZHNF@NFQX^A34+R"ZEH'9SE(U!\E%HL^(ZG81K0/4,Z'FU>P)^S]A3Z4_ MW<96MS8NTI;XV?R%LN1$0X]S^ZI_SF"W]I5[,L4O5YO&$US MFIZ7$^$&*A.*!'P)`Y1%Q:KE2)&G:[UUOE)()!*I^Y+,?9G`ZWSW'E8J;:F0;(G'\E4<[$#^['\S(^,^;]S(`7.D@="(=4;-HORR3$/$,E&G.#S-%1 M!9(3^O<1=DN$WFP9X;*^,I=<]_->TCUL6(ND5-FLEQ.ZBQ^?[RO-H\XOOU]9542+7(U!+8E[>(V[)=TA7FEV7 M^ZJNGK?%EEY4U'A2F=%7YG=K].M9)"*-E:NNGS2FZ5-TJQK%L* MQ^)LE9OCX;T^WYW=,UQ>"*ZDN&_@_KB?EVA@72Y25!U>U]\%V5OCW87EY5N*1TH_ZU#^<[?P^?A 0\_!Y^-#GOP$1DL6#`%```'2Y ` end