Date: Fri, 11 Feb 2005 17:03:20 +0000 From: Peter Edwards <peadar.edwards@gmail.com> To: Maxim Sobolev <sobomax@portaone.com> Cc: "developers@freebsd.org" <current@freebsd.org> Subject: Re: Pthreads performance Message-ID: <34cb7c840502110903356a5813@mail.gmail.com> In-Reply-To: <420CC9F7.40802@portaone.com> References: <420CC9F7.40802@portaone.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Feb 2005 17:06:31 +0200, Maxim Sobolev <sobomax@portaone.com> wrote: > Hi, > > Note: I am purposely posting this to developers to avoid banging into > "FreeBSD 5/6 is slow" drum. My pthreads knowellege is pretty basic, so > that it's possible that this all is false alarm. > I haven't looked at this in detail yet: I'll tinker a bit with it tonight, but some initial comments. FWIW: my UP -current (feb3) box runs the KSE test faster than the linuxthreads test: I'm getting ridiculously low number for sys/user time in linux, so either my kernel/userland mismatch or "time" isn't reporting the sys/user time properly (possible with the linuxthreads using separate processes for children) First off, when using kse/pthreads, you're probably not getting SCOPE_SYSTEM processes by default, which would be closer to what linuxthreads (and libthr) is doing. Looking at the source file: (Disclaimer: just a quick look over it, and I'm stating opinion not fact) You've two threads: the "pushing" thread pushes items on to a stack, possibly waking the "popping" thread. the the popping thread waits for the stack to be not empty, and pops an element off. There's a big problem that the push thread doesn't put a bound on how much memory it will allocate, or cooperate with it's consumer. Depending on how the scheduling works out, if the threads get longer quanta, then the pushing thread may allocate much more memory before the popping thread gets a chance. ie, a larger number of calls to malloc may account for the time difference, too. eg, if the pattern of push (>) to pop (<) looks like this: >>>>>>>>>><<<<<<<<<< There's 10 allocations, and 10 frees of different blocks of memory. While this: ><><><><><><><><><>< Is more likely to cause a single block of memory to be recycled over and over. (Of course, if it was switching for every push/pop, that'd be a huge overhead in terms of context switching) It would be fairer to bound the work of the push thread, rather than have it continue on blindly, and relying on scheduling and chance to limit the work the push thread does. Also, If the push thread could get more "operations" done per quantum than the pop thread, (the process would just keep growing), carving out address space for itself. Even fairer would be to pre-allocate the memory, to avoid measuring the performance of the heap allocator as well (which will do its own synchronisation operations) but you might be mildly interested in that affect anyway: I don't think there's much work gone into FreeBSD's malloc in terms of multithreading (but I'm open to correction), and I've no idea about linux in this regard, but there's plenty of drop-in replacements around).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?34cb7c840502110903356a5813>