From owner-freebsd-threads@FreeBSD.ORG  Sun May 23 22:08:36 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 847CC16A4CE
	for <freebsd-threads@freebsd.org>;
	Sun, 23 May 2004 22:08:36 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D59CF43D45
	for <freebsd-threads@freebsd.org>;
	Sun, 23 May 2004 22:08:35 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4O57mO7044212;
	Mon, 24 May 2004 01:07:48 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i4O57mwl044169;
	Mon, 24 May 2004 01:07:48 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Mon, 24 May 2004 01:07:48 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Petri Helenius <pete@he.iki.fi>
In-Reply-To: <40B1053F.6080604@he.iki.fi>
Message-ID: <Pine.NEB.3.96L.1040524010304.33071A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
cc: Julian Elischer <julian@elischer.org>
cc: freebsd-threads@freebsd.org
Subject: Re: Why is MySQL nearly twice as fast on Linux?
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2004 05:08:36 -0000


On Sun, 23 May 2004, Petri Helenius wrote:

> >There is obviously a bottleneck, but it's very hard to tell what it is..
> >My guess is that the scheduler(s) are not doing a very good job. and the
> >fact that GIANT is not removed from the kernel yet says that generally
> >syscalls will be a bottleneck.
> >
> While watching the top output, I saw a "logjam" to appear from time to=20
> time where all processes/threads were waiting for Giant. However I don=B4=
t=20
> feel that causes the large impact, it might contribute 10-20% but it=20
> does not feel frequent enough to cause 50% difference.

top is a little misleading because it has to acquire Giant in order to
check the status of the other processes.  This increases the chance of
Giant contention.  There are at least a few things going on here.  Among
various results, I saw that switching to a UP kernel improved performance,
but not nearly enough.  This suggests lock contention is not the cause of
the problem.  If you want to investigate lock contention, there are a
couple of things you might try:

(1) Compile the kernel with MUTEX_PROFILING -- it has two contention
    measurement fields that can help track contention.  Note that running
    with mutex profiling will dramatically hurt performance, but might
    still be quite informative.=20

(2) It might be interesting to run with the netperf patches, as they
    should greatly reduce contention for local UNIX domain socket I/O.  I
    haven't tried any benchmarking with MySQL, but it might be worth a
    try.  You can find information on the ongoing work at:=20

=09http://www.watson.org/~robert/freebsd/netperf/

    The work is moving fairly fast, as I'm working on tracking down
    additional socket nits, but it could help.

> >ULE should be able to do a better job at scheduling with
> >multiple CPUs but it is a work in progress. If threads all hit a GIANT=
=20
> >based logjam, there is not a lot the scheduler can do about it..
> >
> I find it hard to believe that the threading stuff would be seriously=20
> broken since we do large processing with libkse and don=B4t have issues=
=20
> with the performance. However I=B4m observing about 50000 context switche=
s=20
> but only 5000 syscalls a second. (I know it=B4s a different application=
=20
> but also for 1500 queries a second 70000 syscalls sounds excessive).

ULE has some sort of known load balancing problem between multiple CPUs --
I've observed it on some local benchmarking with ubench, at least a month
or so ago.  It seemed to provide highly busy processes derived from the
same process tree from migrating properly.  SCHED_4BSD did not have this
problem.  Since we've seen results suggesting changing to SCHED_4BSD
didn't help all that much easier, it's still likely not to be the cause.

A few months ago I did some work to optimize system call cost a bit -- we
had some extra mutex operations.  It might be interesting to use ktrace or
truss to generate a profile of the system call mix in use, perhaps that
would give some informative results about things to look at.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research