Date: Fri, 17 Feb 2006 21:20:30 +0530 From: Joseph Koshy <joseph.koshy@gmail.com> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: freebsd-amd64@freebsd.org Subject: Re: non-temporal copyin/copyout? Message-ID: <84dead720602170750j119080c9g32ec9f1ac0e3944d@mail.gmail.com> In-Reply-To: <17397.58669.457047.277510@grasshopper.cs.duke.edu> References: <17397.58669.457047.277510@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
> I'm bringing this up because I've noticed that FreeBSD 10GbE
> performance is far below Solaris/amd64 and linux/x86_64 when
> using the PCI-e 10GbE adaptor that I'm doing drivers for.
> For example, Solaris can recieve a netperf TCP stream at
There was a bug in my port of netperf; I had left the
`HISTOGRAM' option turned on, which causes it to slow
down significantly.
v2.3.1,1 is the latest & bugfixed version of the port.
> 9.75Gb/sec while using only 47% CPU as measured by vmstat.
> (eg, it is using a little less than a single core). In
> contrast, FreeBSD is limited to 7.7Gb/sec, and uses nearly
> 90% CPU. When profiling with hwpmc, I see a profile which
> shows up to 70% of the time is spent in copyout.
You could use the following events to probe the system:
"k8-dc-miss" : data cache misses
"k8-bu-fill-request-l2-miss,mask=dc-fill" : L2 fills for the
data cache
"k8-dc-misaligned-data-reference": in case there are any
"k8-fr-interrupts-masked-while-pending-cycles": for
finding spots in the code where spin-locks are being
held for long.
You may need to tweak the sample rate (the -n option to
pmcstat); the default of 65536 events per sample may be too
high or too low for some of these. Using pmcstat -p EVENT
will give a feel for a good sample rate to choose for EVENT.
--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84dead720602170750j119080c9g32ec9f1ac0e3944d>
