Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2007 17:56:54 -0500
From:      Martin Cracauer <cracauer@cons.org>
To:        Shantanu Ghosh <shantanu_ghosh@yahoo.com>
Cc:        freebsd-performance@freebsd.org
Subject:   Re: freebsd vs linux: performance problem
Message-ID:  <20071217225654.GA97600@cons.org>
In-Reply-To: <963530.77689.qm@web54505.mail.re2.yahoo.com>
References:  <963530.77689.qm@web54505.mail.re2.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Shantanu Ghosh wrote on Thu, Dec 13, 2007 at 04:07:50AM -0800: 
> Hi,
> 
> I am running FreeBSD 7.0 Beta1 and Linux FC6 on two identical pieces of
> hardware - Dell poweredge with intel core2 duo. Each system has 4 CPUs.

I assume that means 2 CPUs with two cores each, aka socket 771
woodcrests? Please be more specific.  /proc/cpuinfo

> Now, in simple memory access operations, I see the freebsd system being
> noticably slower than the linux system. A simple C program that copies
> from one memory buffer to another, when executed in a loop executes
> between 10-30% slower on freebsd, as compared to linux. The assembly
> code of the program used for testing is identical in both the cases.

Please provide that simple C program.  Below I assume that your
assembly doesn't ever call memcpy() or similar.

Please let us know which Linux kernel version, I gave up on FC and
don't know the FC<x> to kernel<y> map.

Anyway...

This is most likely something I experienced myself: sometime between
Linux 2.6.17 and 2.6.20 they were teaching the kernel about Core2 and
about the shared cache in particular.  Memory task performance such as
piping around gzip output used to be horrible on Core2 systems that
had some system cores sharing L2 cache and others don't, such as a
dual Woodcrests system which has 4 cores total of which two and two
share the L2 cache.  A socket 775 system with just a Core2Duo (which
means all cores in the system share the single L2 cache) used to be
much better than the dual Woodcrest in 2.6.17 but in 2.6.20 it was
fixed.  I assume this is very simply a scheduler change that now knows
which cores share L2 cache and sets affinity appropriately.

On a loaded system with mixed random stuff doing on this is likely not
a factor anymore (because the scheduler has too many other constraints
to babysit one process), but benchmarking and single-tasking can
expose it.

> One observation is that freebsd system performance decreases as the
> size of the buffer increases. If the buffer is under 1k, both the
> sytems give the same performance. freebsd performance is about 10%
> slower if the buffer size is around 4k, and about 30% slower if the
> buffer is around 1Mb. A benchmark like sysbench memory read operation
> performs miserably on the freebsd system, compared to linux.

"buffer" here means you first read <buffersize> bytes, then write
<buffersize> bytes elsewhere?

How do you allocate the buffer to hold this data? Alignment plays a
big role here.

If you can, please give us the C program, otherwise I'd like you to
print the address of the buffer in both cases.

> As far as I can see, the BIOS settings are identical on both the
> machines. Any idea what could be going on?

Make double sure that the hardware readahead that some of the socket
771 chipsets is set in an identical manner.  Also, the snoop filter in
5000x chipset suc^Hffers from underengineering and should be turned
off for most applications.

Also, please run the stream.c benchmark on both, including the Linux
binary on FreeBSD using the Linuxulator as a third run.  I put a copy
on http://www.cons.org/stream.c

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
FreeBSD - where you want to go, today.      http://www.freebsd.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071217225654.GA97600>