Date: Fri, 3 Jan 2014 14:06:58 +0800 From: Erich Dollansky <erichsfreebsdlist@alogt.com> To: chump1@hushmail.com Cc: freebsd-arm@freebsd.org Subject: Re: Beagle recommendations Message-ID: <20140103140658.071f970d@X220.alogt.com> In-Reply-To: <20140103052201.E9397200F5@smtp.hushmail.com> References: <20140103052201.E9397200F5@smtp.hushmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, On Fri, 03 Jan 2014 00:22:01 -0500 chump1@hushmail.com wrote: I have to say that my experience is not related to ARM CPUs but PA-RISC, SPARC and x86 CPUs. > > I have a fairly simple task that involves processing something in a > 2D array, MxN times. I took a naive approach, 1x process 1x thread, > and it took a little longer than desired. Well now, I could do better > with some multi processing, especially on a multi core box, right? > One process and one thread? You should not gain much as I understand your writing. > > Well, I have not had much luck. At first I spawned M threads and had > each iterate over each N in turn, with M between 25-35. It took much, > much longer than the single thread. I figured contention and overhead > were costing me big, and gave it a shot with a scaled down version of > the problem, M=10. Still, much slower than the single thread. A > little confused, I went back to the big problem set (25-35), and made > a new program that spawned only two threads, and each is limited to > processing only even or only odd data sets. Even that still takes > twice as long as the single thread version! What is up with that? > Did you try one process per row having one thread per column? Do the processes and threads have to interact or can each element processed independent of the other elements? > > More important asides, I am barely doing any real processing at all. > It is basically a no-op, barely doing more than incrementing the > counter. Should I expect to see performance gains once I am doing > real work in the processing portion of my program? Should I expect to You will not see the performance drop if you do more processing as the context switches cost at the moment more time than anything else. > see much different behavior on a different OS? Also I have one If you would use a real-time OS, it could be possible but I see it unlikely as your problem has nothing to do with reaction time. > physical processor, two cores. Would I see better gains with more > cores? How do you find processes and threads scale against hardware > overall? Your main problem seems to be that you keep the OS busy with context switches. Use more loops. You could try one process with one thread per row and then loop through the columns. Again, if your problem will allow this. And never forget, if this is all in a single array, you could use a single process and then try to find the proper mix between number of threads and loops. This would take some load of the CPU cache. Erich
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140103140658.071f970d>