Date: Mon, 17 Dec 2007 17:56:54 -0500 From: Martin Cracauer <cracauer@cons.org> To: Shantanu Ghosh <shantanu_ghosh@yahoo.com> Cc: freebsd-performance@freebsd.org Subject: Re: freebsd vs linux: performance problem Message-ID: <20071217225654.GA97600@cons.org> In-Reply-To: <963530.77689.qm@web54505.mail.re2.yahoo.com> References: <963530.77689.qm@web54505.mail.re2.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Shantanu Ghosh wrote on Thu, Dec 13, 2007 at 04:07:50AM -0800: > Hi, > > I am running FreeBSD 7.0 Beta1 and Linux FC6 on two identical pieces of > hardware - Dell poweredge with intel core2 duo. Each system has 4 CPUs. I assume that means 2 CPUs with two cores each, aka socket 771 woodcrests? Please be more specific. /proc/cpuinfo > Now, in simple memory access operations, I see the freebsd system being > noticably slower than the linux system. A simple C program that copies > from one memory buffer to another, when executed in a loop executes > between 10-30% slower on freebsd, as compared to linux. The assembly > code of the program used for testing is identical in both the cases. Please provide that simple C program. Below I assume that your assembly doesn't ever call memcpy() or similar. Please let us know which Linux kernel version, I gave up on FC and don't know the FC<x> to kernel<y> map. Anyway... This is most likely something I experienced myself: sometime between Linux 2.6.17 and 2.6.20 they were teaching the kernel about Core2 and about the shared cache in particular. Memory task performance such as piping around gzip output used to be horrible on Core2 systems that had some system cores sharing L2 cache and others don't, such as a dual Woodcrests system which has 4 cores total of which two and two share the L2 cache. A socket 775 system with just a Core2Duo (which means all cores in the system share the single L2 cache) used to be much better than the dual Woodcrest in 2.6.17 but in 2.6.20 it was fixed. I assume this is very simply a scheduler change that now knows which cores share L2 cache and sets affinity appropriately. On a loaded system with mixed random stuff doing on this is likely not a factor anymore (because the scheduler has too many other constraints to babysit one process), but benchmarking and single-tasking can expose it. > One observation is that freebsd system performance decreases as the > size of the buffer increases. If the buffer is under 1k, both the > sytems give the same performance. freebsd performance is about 10% > slower if the buffer size is around 4k, and about 30% slower if the > buffer is around 1Mb. A benchmark like sysbench memory read operation > performs miserably on the freebsd system, compared to linux. "buffer" here means you first read <buffersize> bytes, then write <buffersize> bytes elsewhere? How do you allocate the buffer to hold this data? Alignment plays a big role here. If you can, please give us the C program, otherwise I'd like you to print the address of the buffer in both cases. > As far as I can see, the BIOS settings are identical on both the > machines. Any idea what could be going on? Make double sure that the hardware readahead that some of the socket 771 chipsets is set in an identical manner. Also, the snoop filter in 5000x chipset suc^Hffers from underengineering and should be turned off for most applications. Also, please run the stream.c benchmark on both, including the Linux binary on FreeBSD using the Linuxulator as a third run. I put a copy on http://www.cons.org/stream.c Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cracauer@cons.org> http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071217225654.GA97600>