Date: Fri, 1 Aug 2014 18:42:37 +0300 From: Mihai Carabas <mihai.carabas@gmail.com> To: soc-status@freebsd.org Subject: Re: [GSOC] bhyve instruction caching Message-ID: <CANg1yUt6mvB5x%2BwcYRRteabcqROeQ3S1wJ8wn3dJdtz478M1EQ@mail.gmail.com> In-Reply-To: <CANg1yUtm5=MY%2BKOfe5_ZpLjVUMTCu0JToac9Ne3VQHErVEPnXg@mail.gmail.com> References: <CANg1yUuazrhybHVVzi2g8vCBSTx3Z=gYmEVXvEMuj2SN%2BRY9Sg@mail.gmail.com> <CANg1yUu_b0qSX=2eXRaO31cogjGdSmkDnEh7PAjfVvCMsAaC1g@mail.gmail.com> <CANg1yUuZU0--O8RgOVx=jKhku1yguvmO4TxUZ5c4wEq6jk6fSw@mail.gmail.com> <CANg1yUsmRCSftYgFWZu_xu-nCROMn3FvuXzfgteiuy4LtAJtvQ@mail.gmail.com> <CANg1yUtcBz3OiL=R-n=Kh1o68-tECLR3FA0ag23F5Z9CjJrFjA@mail.gmail.com> <CANg1yUszaxnxjxqUDXGtwxA%2BAB%2BPs9PPyn8-LbwiC89e-iQpOg@mail.gmail.com> <CANg1yUtm5=MY%2BKOfe5_ZpLjVUMTCu0JToac9Ne3VQHErVEPnXg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Until now I managed to finish up all the coding stuff related to instruction caching. As you saw in my previous e-mails we obtained a speed-up of 35%-40% in the microbenchmarking tests (accessing LAPIC many times from a kernel module). Further we wanted to see how get this extrapolated to real-world workloads. I've made two kinds of benchmarking: a CPU intensive process and a make buildworld -j2 command. For each of one I've measured the time spent to execute. 1) The CPU intensive app is a bash script: #!/usr/local/bin/bash a=0 MAX=10000000 for i in $(seq 1 $MAX); do a=$((a+1)) done For a VM with 2 vCPUs: *Cache_instr=1 real 3m45.067s 3m42.628s 3m38.371s 3m36.301s 3m39.929s user 3m10.454s 3m8.785s 3m7.516s 3m8.204s 3m8.822s sys 0m19.085s 0m16.135s 0m13.696s 0m13.016s 0m16.105s * Cache_instru=0 real 3m50.550s 3m41.517s 3m34.783s user 3m5.350s 3m7.571s 3m1.415s sys 0m25.268s 0m19.200s 0m16.200s There are multiple measurements. As you can see the results aren't stable and are in the same range. To minimize the range they vary, I repeated the tests with 1vCPU (to eliminate the context switches): With 1vCPU: * Cache_instr=1 real 2m58.968s 2m57.009s 3m0.451s 2m55.902s 2m56.422s user 2m46.909s 2m45.241s 2m45.670s 2m45.788s 2m45.503s sys 0m4.890s 0m4.134s 0m3.942s 0m3.764s 0m3.984s * Cache_instr=0 real 2m56.845s 2m57.051s 3m1.794s 2m57.340s user 2m45.232s 2m44.873s 2m45.482s 2m46.538s sys 0m4.644s 0m4.141s 0m3.906s 0m3.875s As you can see the results are very appropiate in terms of variation and almost the same. 2) For a make buildworld -j2 with 1 vCPU: Cache_instr=1 13900.60 real 12051.54 user 1800.42 sys Cache_instr=0 13938.07 real 12122.14 user 1743.61 sys As you can see the difference between them is not significant and is about the same. As you can see for this two different kind of workloads there is no speed-up improvement unfortunatelly. I've tried other workloads more speific like: a) dd if=/dev/zero of=/dev/zero bs=256 count=10000K (from memory to memory - to not be influenced by the storage system) b) A simple getuid program that executes getuid syscall in a loop: int main(int argc, char *argv[]) { int i; if (argc == 2) { i = atoi(argv[1]); } else { i = 100; } while (i > 0) { getuid(); i--; } return 0; } But the results were the same. I spoke with Neel and it seems that we can't get a real-world benefict with this instruction caching. Thanks, Mihai
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANg1yUt6mvB5x%2BwcYRRteabcqROeQ3S1wJ8wn3dJdtz478M1EQ>