Date: Fri, 17 Feb 2006 21:20:30 +0530 From: Joseph Koshy <joseph.koshy@gmail.com> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: freebsd-amd64@freebsd.org Subject: Re: non-temporal copyin/copyout? Message-ID: <84dead720602170750j119080c9g32ec9f1ac0e3944d@mail.gmail.com> In-Reply-To: <17397.58669.457047.277510@grasshopper.cs.duke.edu> References: <17397.58669.457047.277510@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
> I'm bringing this up because I've noticed that FreeBSD 10GbE > performance is far below Solaris/amd64 and linux/x86_64 when > using the PCI-e 10GbE adaptor that I'm doing drivers for. > For example, Solaris can recieve a netperf TCP stream at There was a bug in my port of netperf; I had left the `HISTOGRAM' option turned on, which causes it to slow down significantly. v2.3.1,1 is the latest & bugfixed version of the port. > 9.75Gb/sec while using only 47% CPU as measured by vmstat. > (eg, it is using a little less than a single core). In > contrast, FreeBSD is limited to 7.7Gb/sec, and uses nearly > 90% CPU. When profiling with hwpmc, I see a profile which > shows up to 70% of the time is spent in copyout. You could use the following events to probe the system: "k8-dc-miss" : data cache misses "k8-bu-fill-request-l2-miss,mask=3Ddc-fill" : L2 fills for the data cache "k8-dc-misaligned-data-reference": in case there are any "k8-fr-interrupts-masked-while-pending-cycles": for finding spots in the code where spin-locks are being held for long. You may need to tweak the sample rate (the -n option to pmcstat); the default of 65536 events per sample may be too high or too low for some of these. Using pmcstat -p EVENT will give a feel for a good sample rate to choose for EVENT. -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84dead720602170750j119080c9g32ec9f1ac0e3944d>