Date: Sat, 6 Mar 2010 22:17:16 +0100 From: Bernd Walter <ticso@cicely7.cicely.de> To: Maks Verver <maksverver@geocities.com> Cc: freebsd-arm@freebsd.org Subject: Re: Performance of SheevaPlug on 8-stable Message-ID: <20100306211715.GK58319@cicely7.cicely.de> In-Reply-To: <4B92BD9D.6030709@geocities.com> References: <4B92BD9D.6030709@geocities.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 06, 2010 at 09:39:57PM +0100, Maks Verver wrote: > Hi everyone, > > After a bit of patching and tinkering I got my SheevaPlug to boot > FreeBSD from a UFS2-formatted USB stick. To compare it with Linux I > decided to run nbench to see how FreeBSD compares with Ubuntu (which is > shipped with the SheevaPlug). To my surprise, the results were > atrocious! FreeBSD scores about 50 times worse than Ubuntu. > > Of course, this performance difference is too large to be caused by > implementation differences. There must be something more fundemental > wrong here. To simplify things, I created a simple testcase that counts > up to the maximum value of an integer: > > int main() { int i = 0; do ++i; while(i > 0); return 0; } > > This compiles to: (both on Linux and on FreeBSD) > > 0000848c <main>: > 848c: e3a03000 mov r3, #0 ; 0x0 > 8490: e2833001 add r3, r3, #1 ; 0x1 > 8494: e3530000 cmp r3, #0 ; 0x0 > 8498: cafffffc bgt 8490 <main+0x4> > 849c: e3a00000 mov r0, #0 ; 0x0 > 84a0: e1a0f00e mov pc, lr > > This stresses the CPU and not much else. Since there are three > instructions in the loop and the SheevaPlug runs at 1.2 GHz, I > expect this to take around (1<<31)*3/1.2e9 ~ 5.3687 seconds. On Ubuntu: > > $ time ./test > real 0m5.422s > user 0m5.390s > sys 0m0.020s > > Exactly as expected. On FreeBSD on the other hand: > > %time ./test > 286.000u 0.000s 4:47.22 99.8% 40+1321k 0+0io 0pf+0w > > This takes almost five minutes, or over 50 times as long! All of it is > user-space CPU time. Does anybody have a suggestion why the CPU appears > to run so slowly in FreeBSD? I was tempted to say different compiler optimisaitons, but you say that the resulting code is the same. Such massive speed difference sounds a bit like cache problems. For what it's worth - I see it takes minutes (not finished yet) on 180MHz RM9200 as well. According to dmesg IC is enabled: CPU: ARM920T rev 0 (ARM9TDMI core) DC enabled IC enabled WB enabled LABT 16KB/32B 64-way Instruction cache 16KB/32B 64-way write-back-locking-A Data cache If the above calculation is correct I would expect it to finish after ~7 times more time than calculated. If the calculation is wrong, then why does Ubunto agrees with it? > I pored over my kernel configuration but I don't see anything suspect. I > did (manually) apply Hans Petter Selasky's patch [1] to be able to boot > from USB, and consequently removed the NFS and BOOTP stuff from the > config provided at sys/arm/conf/SHEEVAPLUG. Furthermore I removed the > NO_SWAPPING and NO_FFS_SNAPSHOT options (because I plan to attach a USB > disk drive) and I left in the KDB and DDB options because as I think > they do not significantly affect performance. Is this correct? > > Kind regards, > Maks Verver. > > P.S. The strange thing is that stuff like network performance is > perfectly fine. I can fetch FTP data at 11 MB/s, which is about the > maximum possible on the cheap 100 Mbit switch I use, and is even a few > percent better than Ubuntu. So it seems it's really the CPU that's the > bottleneck, for no apparent reason. FTP won't win that much from cache and our network stack might outweight the loss, so this all makes sense if IC cache won't work. I think you have a very interesting catch, although I don't know why it exactly is. -- B.Walter <bernd@bwct.de> http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100306211715.GK58319>