From owner-freebsd-current@FreeBSD.ORG  Sat Feb 12 02:00:04 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E54AC16A4CE
	for <current@freebsd.org>; Sat, 12 Feb 2005 02:00:03 +0000 (GMT)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 730E343D1F
	for <current@freebsd.org>; Sat, 12 Feb 2005 02:00:03 +0000 (GMT)
	(envelope-from peadar.edwards@gmail.com)
Received: by wproxy.gmail.com with SMTP id 58so1572355wri
        for <current@freebsd.org>; Fri, 11 Feb 2005 18:00:03 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
	s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references;
	b=aakbHkV30lZdUPbwN8HBP6FHvTIjG3n6ugtYm0+5XSReWoAQ+SCC6+A/Fj2+n7m5QT3xDf2YpWBmscF8wjCfgEajZxv4/U61d3NOr8UxuZtvz3JVYAK3JQuiWxung5/G8iggTHbJ8u1vaodFjB/jwMJPkpDRuPjytE/2muMp224=
Received: by 10.54.57.72 with SMTP id f72mr272523wra;
        Fri, 11 Feb 2005 18:00:02 -0800 (PST)
Received: by 10.54.57.20 with HTTP; Fri, 11 Feb 2005 18:00:02 -0800 (PST)
Message-ID: <34cb7c84050211180038575a59@mail.gmail.com>
Date: Sat, 12 Feb 2005 02:00:02 +0000
From: Peter Edwards <peadar.edwards@gmail.com>
To: Maxim Sobolev <sobomax@portaone.com>,
	Peter Edwards <peadar.edwards@gmail.com>, current@freebsd.org
In-Reply-To: <20050211224126.GA43252@VARK.MIT.EDU>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
References: <420CC9F7.40802@portaone.com>
	 <34cb7c840502110903356a5813@mail.gmail.com>
	 <420CEC42.2070504@portaone.com> <20050211224126.GA43252@VARK.MIT.EDU>
Subject: Re: Pthreads performance
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: Peter Edwards <peadar.edwards@gmail.com>
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Feb 2005 02:00:04 -0000

On Fri, 11 Feb 2005 17:41:26 -0500, David Schultz <das@freebsd.org> wrote:
> On Fri, Feb 11, 2005, Maxim Sobolev wrote:
> > Thank you for the analysis! Looks like you have at least some valid
> > points. I've modified the code to count how many times producer calls
> > malloc() to allocate a new slot, and got the following numbers:
> >
> > -bash-2.05b$ ./aqueue_linuxthreads -n 10000000
> > pusher started
> > poper started
> > total 237482 slots used
> > -bash-2.05b$ ./aqueue_kse -n 10000000
> > pusher started
> > poper started
> > total 403966 slots used
> > -bash-2.05b$ ./aqueue_thr -n 10000000
> > pusher started
> > poper started
> > total 223634 slots used
> > -bash-2.05b$ ./aqueue_c_r -n 10000000
> > pusher started
> > poper started
> > total 55589 slots used
> >
> > This suggests that indeed, it is unfair to compare KSE times to LT
> > times, since KSE have done almost 2x more malloc()s than LT. However, as
> > you can see, libthr have done comparable number of allocations, while
> > c_r about 4 times less, so that only malloc() cost can't fully explain
> > the difference in results.
> 
> The difference in the number of mallocs may be related to the way
> mutex unlocks work.  Some systems do direct handoff to the next
> waiting thread.  Suppose one thread does:
> 
>         pthread_mutex_lock()
>         pthread_mutex_unlock()
>         pthread_mutex_lock()
> 
> With direct handoff, the second lock operation would automatically
> cause an immediate context switch, since ownership of the mutex
> has already been transferred to the other thread.  Without direct
> handoff, the thread may be able to get the lock back immediately;
> in fact, this is almost certainly what will happen on a uniprocessor.
> Since the example code has no mechanism to ensure fairness, without
> direct handoff, one of the threads could perform thousands of
> iterations before the other one wakes up, and this could explain
> all the calls to malloc().
> 
> The part of this picture that doesn't fit is that I was under the
> impression that KSE uses direct handoff...

The direct handoff is probably fine for a directly contended mutex,
but for condition  variables, IMHO, it makes more sense to _not_ do
direct handoff. In a standard producer/consumer model, it seems better
to have the producer work to the point that it gets flow controlled,
and then let the consumer start processing the available data: i.e.,
rather than deal with 100 context switches of (produce->consume)x50 ,
it's likely that (produce)x50->(consume)x50 will reduce context
switching, and improve cacheing behaviour.
i.e., I'd rather not loose my quantum just because I created some
productive work for a consumer to process: It looses many locality of
reference benefits.

I think that's a much more realistic scenario for the use of condition
variables than the sample under discussion.

Disclaimer: This is based on instinct and limited experience rather
than rigorous research. :-)