Date: Sat, 12 Feb 2005 02:00:02 +0000 From: Peter Edwards <peadar.edwards@gmail.com> To: Maxim Sobolev <sobomax@portaone.com>, Peter Edwards <peadar.edwards@gmail.com>, current@freebsd.org Subject: Re: Pthreads performance Message-ID: <34cb7c84050211180038575a59@mail.gmail.com> In-Reply-To: <20050211224126.GA43252@VARK.MIT.EDU> References: <420CC9F7.40802@portaone.com> <34cb7c840502110903356a5813@mail.gmail.com> <420CEC42.2070504@portaone.com> <20050211224126.GA43252@VARK.MIT.EDU>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Feb 2005 17:41:26 -0500, David Schultz <das@freebsd.org> wrote: > On Fri, Feb 11, 2005, Maxim Sobolev wrote: > > Thank you for the analysis! Looks like you have at least some valid > > points. I've modified the code to count how many times producer calls > > malloc() to allocate a new slot, and got the following numbers: > > > > -bash-2.05b$ ./aqueue_linuxthreads -n 10000000 > > pusher started > > poper started > > total 237482 slots used > > -bash-2.05b$ ./aqueue_kse -n 10000000 > > pusher started > > poper started > > total 403966 slots used > > -bash-2.05b$ ./aqueue_thr -n 10000000 > > pusher started > > poper started > > total 223634 slots used > > -bash-2.05b$ ./aqueue_c_r -n 10000000 > > pusher started > > poper started > > total 55589 slots used > > > > This suggests that indeed, it is unfair to compare KSE times to LT > > times, since KSE have done almost 2x more malloc()s than LT. However, as > > you can see, libthr have done comparable number of allocations, while > > c_r about 4 times less, so that only malloc() cost can't fully explain > > the difference in results. > > The difference in the number of mallocs may be related to the way > mutex unlocks work. Some systems do direct handoff to the next > waiting thread. Suppose one thread does: > > pthread_mutex_lock() > pthread_mutex_unlock() > pthread_mutex_lock() > > With direct handoff, the second lock operation would automatically > cause an immediate context switch, since ownership of the mutex > has already been transferred to the other thread. Without direct > handoff, the thread may be able to get the lock back immediately; > in fact, this is almost certainly what will happen on a uniprocessor. > Since the example code has no mechanism to ensure fairness, without > direct handoff, one of the threads could perform thousands of > iterations before the other one wakes up, and this could explain > all the calls to malloc(). > > The part of this picture that doesn't fit is that I was under the > impression that KSE uses direct handoff... The direct handoff is probably fine for a directly contended mutex, but for condition variables, IMHO, it makes more sense to _not_ do direct handoff. In a standard producer/consumer model, it seems better to have the producer work to the point that it gets flow controlled, and then let the consumer start processing the available data: i.e., rather than deal with 100 context switches of (produce->consume)x50 , it's likely that (produce)x50->(consume)x50 will reduce context switching, and improve cacheing behaviour. i.e., I'd rather not loose my quantum just because I created some productive work for a consumer to process: It looses many locality of reference benefits. I think that's a much more realistic scenario for the use of condition variables than the sample under discussion. Disclaimer: This is based on instinct and limited experience rather than rigorous research. :-)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?34cb7c84050211180038575a59>