Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Aug 2014 18:42:37 +0300
From:      Mihai Carabas <mihai.carabas@gmail.com>
To:        soc-status@freebsd.org
Subject:   Re: [GSOC] bhyve instruction caching
Message-ID:  <CANg1yUt6mvB5x%2BwcYRRteabcqROeQ3S1wJ8wn3dJdtz478M1EQ@mail.gmail.com>
In-Reply-To: <CANg1yUtm5=MY%2BKOfe5_ZpLjVUMTCu0JToac9Ne3VQHErVEPnXg@mail.gmail.com>
References:  <CANg1yUuazrhybHVVzi2g8vCBSTx3Z=gYmEVXvEMuj2SN%2BRY9Sg@mail.gmail.com> <CANg1yUu_b0qSX=2eXRaO31cogjGdSmkDnEh7PAjfVvCMsAaC1g@mail.gmail.com> <CANg1yUuZU0--O8RgOVx=jKhku1yguvmO4TxUZ5c4wEq6jk6fSw@mail.gmail.com> <CANg1yUsmRCSftYgFWZu_xu-nCROMn3FvuXzfgteiuy4LtAJtvQ@mail.gmail.com> <CANg1yUtcBz3OiL=R-n=Kh1o68-tECLR3FA0ag23F5Z9CjJrFjA@mail.gmail.com> <CANg1yUszaxnxjxqUDXGtwxA%2BAB%2BPs9PPyn8-LbwiC89e-iQpOg@mail.gmail.com> <CANg1yUtm5=MY%2BKOfe5_ZpLjVUMTCu0JToac9Ne3VQHErVEPnXg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

Until now I managed to finish up all the coding stuff related to
instruction caching. As you saw in my previous e-mails we obtained a
speed-up of 35%-40% in the microbenchmarking tests (accessing LAPIC
many times from a kernel module). Further we wanted to see how get
this extrapolated to real-world workloads.

I've made two kinds of benchmarking: a CPU intensive process and a
make buildworld -j2 command. For each of one I've measured the time
spent to execute.

1) The CPU intensive app is a bash script:
#!/usr/local/bin/bash
a=0
MAX=10000000
for i in $(seq 1 $MAX);
do
        a=$((a+1))
done

For a VM with 2 vCPUs:
*Cache_instr=1
real    3m45.067s 3m42.628s 3m38.371s 3m36.301s 3m39.929s
user    3m10.454s 3m8.785s 3m7.516s 3m8.204s 3m8.822s
sys     0m19.085s 0m16.135s 0m13.696s 0m13.016s 0m16.105s

* Cache_instru=0
real    3m50.550s 3m41.517s 3m34.783s
user    3m5.350s 3m7.571s 3m1.415s
sys     0m25.268s 0m19.200s 0m16.200s

There are multiple measurements. As you can see the results aren't
stable and are in the same range. To minimize the range they vary, I
repeated the tests with 1vCPU (to eliminate the context switches):

With 1vCPU:

* Cache_instr=1
real    2m58.968s 2m57.009s 3m0.451s 2m55.902s 2m56.422s
user    2m46.909s 2m45.241s 2m45.670s 2m45.788s 2m45.503s
sys     0m4.890s 0m4.134s 0m3.942s 0m3.764s 0m3.984s

* Cache_instr=0
real    2m56.845s 2m57.051s 3m1.794s 2m57.340s
user    2m45.232s 2m44.873s 2m45.482s 2m46.538s
sys     0m4.644s 0m4.141s 0m3.906s 0m3.875s

As you can see the results are very appropiate in terms of variation
and almost the same.

2) For a make buildworld -j2 with 1 vCPU:
Cache_instr=1
    13900.60 real     12051.54 user      1800.42 sys
Cache_instr=0
    13938.07 real     12122.14 user      1743.61 sys

As you can see the difference between them is not significant and is
about the same.

As you can see for this two different kind of workloads there is no
speed-up improvement unfortunatelly.

I've tried other workloads more speific like:
a) dd if=/dev/zero of=/dev/zero bs=256 count=10000K (from memory to
memory - to not be influenced by the storage system)
b) A simple getuid program that executes getuid syscall in a loop:
int main(int argc, char *argv[])
{
        int i;
        if (argc == 2) {
                i = atoi(argv[1]);
        } else {
                i = 100;
        }
        while (i > 0) {
                getuid();
                i--;
        }
        return 0;
}

But the results were the same.

I spoke with Neel and it seems that we can't get a real-world benefict
with this instruction caching.

Thanks,
Mihai



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANg1yUt6mvB5x%2BwcYRRteabcqROeQ3S1wJ8wn3dJdtz478M1EQ>