From owner-freebsd-current@FreeBSD.ORG Sat Feb 12 02:00:04 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E54AC16A4CE for ; Sat, 12 Feb 2005 02:00:03 +0000 (GMT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 730E343D1F for ; Sat, 12 Feb 2005 02:00:03 +0000 (GMT) (envelope-from peadar.edwards@gmail.com) Received: by wproxy.gmail.com with SMTP id 58so1572355wri for ; Fri, 11 Feb 2005 18:00:03 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=aakbHkV30lZdUPbwN8HBP6FHvTIjG3n6ugtYm0+5XSReWoAQ+SCC6+A/Fj2+n7m5QT3xDf2YpWBmscF8wjCfgEajZxv4/U61d3NOr8UxuZtvz3JVYAK3JQuiWxung5/G8iggTHbJ8u1vaodFjB/jwMJPkpDRuPjytE/2muMp224= Received: by 10.54.57.72 with SMTP id f72mr272523wra; Fri, 11 Feb 2005 18:00:02 -0800 (PST) Received: by 10.54.57.20 with HTTP; Fri, 11 Feb 2005 18:00:02 -0800 (PST) Message-ID: <34cb7c84050211180038575a59@mail.gmail.com> Date: Sat, 12 Feb 2005 02:00:02 +0000 From: Peter Edwards To: Maxim Sobolev , Peter Edwards , current@freebsd.org In-Reply-To: <20050211224126.GA43252@VARK.MIT.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <420CC9F7.40802@portaone.com> <34cb7c840502110903356a5813@mail.gmail.com> <420CEC42.2070504@portaone.com> <20050211224126.GA43252@VARK.MIT.EDU> Subject: Re: Pthreads performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Peter Edwards List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Feb 2005 02:00:04 -0000 On Fri, 11 Feb 2005 17:41:26 -0500, David Schultz wrote: > On Fri, Feb 11, 2005, Maxim Sobolev wrote: > > Thank you for the analysis! Looks like you have at least some valid > > points. I've modified the code to count how many times producer calls > > malloc() to allocate a new slot, and got the following numbers: > > > > -bash-2.05b$ ./aqueue_linuxthreads -n 10000000 > > pusher started > > poper started > > total 237482 slots used > > -bash-2.05b$ ./aqueue_kse -n 10000000 > > pusher started > > poper started > > total 403966 slots used > > -bash-2.05b$ ./aqueue_thr -n 10000000 > > pusher started > > poper started > > total 223634 slots used > > -bash-2.05b$ ./aqueue_c_r -n 10000000 > > pusher started > > poper started > > total 55589 slots used > > > > This suggests that indeed, it is unfair to compare KSE times to LT > > times, since KSE have done almost 2x more malloc()s than LT. However, as > > you can see, libthr have done comparable number of allocations, while > > c_r about 4 times less, so that only malloc() cost can't fully explain > > the difference in results. > > The difference in the number of mallocs may be related to the way > mutex unlocks work. Some systems do direct handoff to the next > waiting thread. Suppose one thread does: > > pthread_mutex_lock() > pthread_mutex_unlock() > pthread_mutex_lock() > > With direct handoff, the second lock operation would automatically > cause an immediate context switch, since ownership of the mutex > has already been transferred to the other thread. Without direct > handoff, the thread may be able to get the lock back immediately; > in fact, this is almost certainly what will happen on a uniprocessor. > Since the example code has no mechanism to ensure fairness, without > direct handoff, one of the threads could perform thousands of > iterations before the other one wakes up, and this could explain > all the calls to malloc(). > > The part of this picture that doesn't fit is that I was under the > impression that KSE uses direct handoff... The direct handoff is probably fine for a directly contended mutex, but for condition variables, IMHO, it makes more sense to _not_ do direct handoff. In a standard producer/consumer model, it seems better to have the producer work to the point that it gets flow controlled, and then let the consumer start processing the available data: i.e., rather than deal with 100 context switches of (produce->consume)x50 , it's likely that (produce)x50->(consume)x50 will reduce context switching, and improve cacheing behaviour. i.e., I'd rather not loose my quantum just because I created some productive work for a consumer to process: It looses many locality of reference benefits. I think that's a much more realistic scenario for the use of condition variables than the sample under discussion. Disclaimer: This is based on instinct and limited experience rather than rigorous research. :-)