From owner-freebsd-current@FreeBSD.ORG Fri Feb 11 17:03:22 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9AB0C16A55B for ; Fri, 11 Feb 2005 17:03:22 +0000 (GMT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.200]) by mx1.FreeBSD.org (Postfix) with ESMTP id 840E843D53 for ; Fri, 11 Feb 2005 17:03:21 +0000 (GMT) (envelope-from peadar.edwards@gmail.com) Received: by wproxy.gmail.com with SMTP id 58so1500580wri for ; Fri, 11 Feb 2005 09:03:20 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=DdP9j4edpAGehANCOnf+lrEm7003KzDv1v3tflVaz/8yifazP+5x0RalA810edciHM6WyFd7TqYmO0GFifyG1ehOWyD3SI5Q3jzE723OXTEoaIep+4ARx5eEx41c7HeEgz61OFu5cehIfSBFioZbQwhFLjPF+hSZnPxXhmzaNCw= Received: by 10.54.2.55 with SMTP id 55mr127535wrb; Fri, 11 Feb 2005 09:03:20 -0800 (PST) Received: by 10.54.57.20 with HTTP; Fri, 11 Feb 2005 09:03:20 -0800 (PST) Message-ID: <34cb7c840502110903356a5813@mail.gmail.com> Date: Fri, 11 Feb 2005 17:03:20 +0000 From: Peter Edwards To: Maxim Sobolev In-Reply-To: <420CC9F7.40802@portaone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <420CC9F7.40802@portaone.com> cc: "developers@freebsd.org" Subject: Re: Pthreads performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Peter Edwards List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2005 17:03:23 -0000 On Fri, 11 Feb 2005 17:06:31 +0200, Maxim Sobolev wrote: > Hi, > > Note: I am purposely posting this to developers to avoid banging into > "FreeBSD 5/6 is slow" drum. My pthreads knowellege is pretty basic, so > that it's possible that this all is false alarm. > I haven't looked at this in detail yet: I'll tinker a bit with it tonight, but some initial comments. FWIW: my UP -current (feb3) box runs the KSE test faster than the linuxthreads test: I'm getting ridiculously low number for sys/user time in linux, so either my kernel/userland mismatch or "time" isn't reporting the sys/user time properly (possible with the linuxthreads using separate processes for children) First off, when using kse/pthreads, you're probably not getting SCOPE_SYSTEM processes by default, which would be closer to what linuxthreads (and libthr) is doing. Looking at the source file: (Disclaimer: just a quick look over it, and I'm stating opinion not fact) You've two threads: the "pushing" thread pushes items on to a stack, possibly waking the "popping" thread. the the popping thread waits for the stack to be not empty, and pops an element off. There's a big problem that the push thread doesn't put a bound on how much memory it will allocate, or cooperate with it's consumer. Depending on how the scheduling works out, if the threads get longer quanta, then the pushing thread may allocate much more memory before the popping thread gets a chance. ie, a larger number of calls to malloc may account for the time difference, too. eg, if the pattern of push (>) to pop (<) looks like this: >>>>>>>>>><<<<<<<<<< There's 10 allocations, and 10 frees of different blocks of memory. While this: ><><><><><><><><><>< Is more likely to cause a single block of memory to be recycled over and over. (Of course, if it was switching for every push/pop, that'd be a huge overhead in terms of context switching) It would be fairer to bound the work of the push thread, rather than have it continue on blindly, and relying on scheduling and chance to limit the work the push thread does. Also, If the push thread could get more "operations" done per quantum than the pop thread, (the process would just keep growing), carving out address space for itself. Even fairer would be to pre-allocate the memory, to avoid measuring the performance of the heap allocator as well (which will do its own synchronisation operations) but you might be mildly interested in that affect anyway: I don't think there's much work gone into FreeBSD's malloc in terms of multithreading (but I'm open to correction), and I've no idea about linux in this regard, but there's plenty of drop-in replacements around).