Date: Thu, 15 Apr 2010 10:20:00 +0900 (JST) From: Maho NAKATA <chat95@mac.com> To: amvandemore@gmail.com Cc: alc@freebsd.org, alan.l.cox@gmail.com, freebsd-stable@freebsd.org, als@modulus.org, avg@freebsd.org Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Message-ID: <20100415.102000.645538350615365151.chat95@mac.com> In-Reply-To: <n2o6201873e1004141047t97d89cb0o2688fae1875eae08@mail.gmail.com> References: <m2y6201873e1004140945n855c8800we9baced2e293f270@mail.gmail.com> <4BC5F289.7020408@freebsd.org> <n2o6201873e1004141047t97d89cb0o2688fae1875eae08@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Andriy and Adam, I did also the same thing as suggested. my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off, My result of dgemm GotoBLAS performance was following. *summary of result 36-39GFlops 81-87% of peak performance without pinning 35-40GFlops 78-89% of peak performance with pinning my observation * performance is somewhat unstable like 35GFlops then next calculation is 40GFlops...and flips etc. jittering is observed. * pinning makes performance somewhat stabler, but we don't gain a bit more. Details. First I ran %./dgemm n: 3500 time : 84.431008 or 22.428125 Mflops : 38244.168629 n: 3600 time : 90.162220 or 23.440381 Mflops : 39819.284422 n: 3700 time : 101.427504 or 27.404345 Mflops : 36977.121646 Note: 36-39GFlops 81-87% of peak performance then, pinned to each core like following % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 3 190 run - 1408 100161 dgemm - 2 190 run - 1408 100162 dgemm - 2 190 run - 1408 100163 dgemm - 1 189 run - 1408 100164 dgemm - 0 190 run - 1408 100165 dgemm - 3 189 run - 1408 100166 dgemm - 1 190 run - 1408 100167 dgemm initial thread 0 190 run - % cpuset -t 100160 -l 0 % cpuset -t 100161 -l 0 % cpuset -t 100162 -l 1 % cpuset -t 100163 -l 1 % cpuset -t 100164 -l 2 % cpuset -t 100165 -l 2 % cpuset -t 100166 -l 3 % cpuset -t 100167 -l 3 then, % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 0 191 run - 1408 100161 dgemm - 0 191 run - 1408 100162 dgemm - 1 190 run - 1408 100163 dgemm - 1 190 run - 1408 100164 dgemm - 2 190 run - 1408 100165 dgemm - 2 190 run - 1408 100166 dgemm - 3 190 run - 1408 100167 dgemm initial thread 3 190 run - n: 4000 time : 121.907696 or 31.475052 Mflops : 40677.295630 n: 4100 time : 139.842701 or 38.702532 Mflops : 35624.444587 n: 4200 time : 143.622179 or 36.725949 Mflops : 40356.011158 n: 4300 time : 153.742976 or 39.465752 Mflops : 40301.013511 n: 4400 time : 164.919566 or 42.380653 Mflops : 40208.611317 n: 4500 time : 175.930335 or 45.422572 Mflops : 40132.139469 Thanks From: Adam Vande More <amvandemore@gmail.com> Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 12:47:31 -0500 > On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon <avg@freebsd.org> wrote: > >> on 14/04/2010 19:45 Adam Vande More said the following: >> > >> > also if I run cpuset on the dgemm then the utilization is basically at >> > the theoretical max for one core so at least that part is working. >> >> You can also try procstat -t <pid> to find out thread IDs and cpuset -t to >> pin the >> threads to the cores. >> > > it gets to around 90% doing that. > > time : 103.617271 or 27.140992 > Mflops : 47172.925449 > n: 4100 > time : 113.910669 or 30.520677 > Mflops : 45174.496186 > n: 4200 > time : 121.880695 or 32.068070 > Mflops : 46217.711013 > n: 4300 > > tried a couple of different thread orders but didn't seem to make a > difference. > > galacticdominator% procstat -t 1922 > PID TID COMM TDNAME CPU PRI STATE WCHAN > 1922 100092 dgemm initial thread 0 190 run - > 1922 100268 dgemm - 1 190 run - > 1922 100270 dgemm - 1 191 run - > 1922 100272 dgemm - 3 190 run - > 1922 100273 dgemm - 2 191 run - > 1922 100274 dgemm - 2 191 run - > 1922 100282 dgemm - 0 190 run - > 1922 100283 dgemm - 3 190 run - > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 1 > galacticdominator% cpuset -t 100270 -l 2 > galacticdominator% cpuset -t 100272 -l 3 > galacticdominator% cpuset -t 100273 -l 0 > galacticdominator% cpuset -t 100274 -l 1 > galacticdominator% cpuset -t 100282 -l 2 > galacticdominator% cpuset -t 100283 -l 3 > > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 0 > galacticdominator% cpuset -t 100270 -l 1 > galacticdominator% cpuset -t 100272 -l 1 > galacticdominator% cpuset -t 100273 -l 2 > galacticdominator% cpuset -t 100274 -l 2 > galacticdominator% cpuset -t 100282 -l 3 > galacticdominator% cpuset -t 100283 -l 3 > > > This is from the second set: > > time : 150.348850 or 40.488350 > Mflops : 45022.951141 > n: 4600 > time : 161.968982 or 43.589618 > Mflops : 44669.884500 > n: 4700 > > Since this is a full fledged desktop environment, 90% utilization seems > pretty good. I'm no expert Andriy, but it seems like if gotoblas > implemented some of the FreeBSD optimizations then we'd be in the same > ballpark. > > > -- > Adam Vande More
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100415.102000.645538350615365151.chat95>