From owner-freebsd-current@FreeBSD.ORG Fri Feb 11 22:41:46 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D99216A4CE for ; Fri, 11 Feb 2005 22:41:46 +0000 (GMT) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2986743D1F for ; Fri, 11 Feb 2005 22:41:46 +0000 (GMT) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.13.1/8.13.1) with ESMTP id j1BMfRHY043457; Fri, 11 Feb 2005 17:41:27 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.13.1/8.13.1/Submit) id j1BMfR93043456; Fri, 11 Feb 2005 17:41:27 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Fri, 11 Feb 2005 17:41:26 -0500 From: David Schultz To: Maxim Sobolev Message-ID: <20050211224126.GA43252@VARK.MIT.EDU> Mail-Followup-To: Maxim Sobolev , Peter Edwards , current@FreeBSD.ORG References: <420CC9F7.40802@portaone.com> <34cb7c840502110903356a5813@mail.gmail.com> <420CEC42.2070504@portaone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <420CEC42.2070504@portaone.com> cc: current@FreeBSD.ORG cc: Peter Edwards Subject: Re: Pthreads performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2005 22:41:46 -0000 On Fri, Feb 11, 2005, Maxim Sobolev wrote: > Thank you for the analysis! Looks like you have at least some valid > points. I've modified the code to count how many times producer calls > malloc() to allocate a new slot, and got the following numbers: > > -bash-2.05b$ ./aqueue_linuxthreads -n 10000000 > pusher started > poper started > total 237482 slots used > -bash-2.05b$ ./aqueue_kse -n 10000000 > pusher started > poper started > total 403966 slots used > -bash-2.05b$ ./aqueue_thr -n 10000000 > pusher started > poper started > total 223634 slots used > -bash-2.05b$ ./aqueue_c_r -n 10000000 > pusher started > poper started > total 55589 slots used > > This suggests that indeed, it is unfair to compare KSE times to LT > times, since KSE have done almost 2x more malloc()s than LT. However, as > you can see, libthr have done comparable number of allocations, while > c_r about 4 times less, so that only malloc() cost can't fully explain > the difference in results. The difference in the number of mallocs may be related to the way mutex unlocks work. Some systems do direct handoff to the next waiting thread. Suppose one thread does: pthread_mutex_lock() pthread_mutex_unlock() pthread_mutex_lock() With direct handoff, the second lock operation would automatically cause an immediate context switch, since ownership of the mutex has already been transferred to the other thread. Without direct handoff, the thread may be able to get the lock back immediately; in fact, this is almost certainly what will happen on a uniprocessor. Since the example code has no mechanism to ensure fairness, without direct handoff, one of the threads could perform thousands of iterations before the other one wakes up, and this could explain all the calls to malloc(). The part of this picture that doesn't fit is that I was under the impression that KSE uses direct handoff... FWIW, there's a separate threads@ list for this sort of thing.