From owner-freebsd-threads@FreeBSD.ORG Thu Mar 15 13:34:11 2007 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1E2E416A403; Thu, 15 Mar 2007 13:34:11 +0000 (UTC) (envelope-from mezz7@cox.net) Received: from eastrmmtai104.cox.net (eastrmmtai104.cox.net [68.230.240.11]) by mx1.freebsd.org (Postfix) with ESMTP id AFC5613C45A; Thu, 15 Mar 2007 13:34:10 +0000 (UTC) (envelope-from mezz7@cox.net) Received: from eastrmimpo02.cox.net ([68.1.16.120]) by eastrmmtao105.cox.net (InterMail vM.7.05.02.00 201-2174-114-20060621) with ESMTP id <20070315124215.FFRR16421.eastrmmtao105.cox.net@eastrmimpo02.cox.net>; Thu, 15 Mar 2007 08:42:15 -0400 Received: from mezz.mezzweb.com ([24.255.149.218]) by eastrmimpo02.cox.net with bizsmtp id b0iD1W0044iy4EG0000000; Thu, 15 Mar 2007 08:42:14 -0400 Date: Thu, 15 Mar 2007 07:44:03 -0500 To: "Diomidis Spinellis" From: "Jeremy Messenger" Content-Type: text/plain; format=flowed; delsp=yes; charset=us-ascii MIME-Version: 1.0 References: <45F906ED.8070100@aueb.gr> Content-Transfer-Encoding: 7bit Message-ID: In-Reply-To: <45F906ED.8070100@aueb.gr> User-Agent: Opera Mail/9.10 (Linux) Cc: freebsd-threads@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Multithreaded qsort(3) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Mar 2007 13:34:11 -0000 On Thu, 15 Mar 2007 03:42:21 -0500, Diomidis Spinellis wrote: > Greetings, > > I've added multhread support to our qsort implementation. You can see > the code at and some > performance figures at the end of this email. I propose to add it to > our libc. I need your opinions on the interface, and also any comments > you may have on the code. I see the following design dimensions: That remind me other days when I have readed in one of blog that have some good info about sorts. I will posting a few of links in here in case if anyone is insteresting to read. ;-) Quick Sort (qsort()): http://jeffreystedfast.blogspot.com/2007/03/quick-sort.html 25 Byte Sort Routine: http://jeffreystedfast.blogspot.com/2007/03/25-byte-sort-routine.html Merge Sort: http://jeffreystedfast.blogspot.com/2007/02/merge-sort.html A lot of more sorts: Shell Sort Binary Insertion Sort Insertion Sort Bubble Sort http://jeffreystedfast.blogspot.com/search/label/sorting Cheers, Mezz > 1. Naming and availability > -------------------------- > - Option 1.1: Replace qsort with this implementation > * Pros: Programs gain performance without any source code change; > abstraction (number of processors/cores should be invisible, just as the > CPU architecture or disk drive interface) > * Cons: May confuse multithreaded programs; programs require linking > with pthreads library; programs whose compare function is not thread > safe will break > > - Option 1.2: Name it qsort_mt > * Pros: POLA; programs can tune the call according to their need > * Cons: Programs require source code changes; we violate abstraction; > namespace polution > > My proposal: option 1.2: name it qsort_mt and make it available in a > separate library (libc_mt). > > > 2. Interface > ------------ > - Option 2.1: qsort compatible > * Pros: Drop in replacement; doesn't need programmer understanding; > compatible with option 1.1 > * Cons: Can't be tuned for a specific program or workload > > - Option 2.2: Add nthreads parameter, specifying the maximum number of > threads > * Pros: Multithreaded programs can tune load balancing; all programs can > optimise nthreads according to the workload characteristics. > * Cons: Libc routines are generaly expected to Do he Right Thing, > without handholding; must agree on mechanism for determining the number > of threads; the benefits of tuning are unlikely to be dramatic. > > - Option 2.3: Add forkelem, specifying minimum number of elements for a > new thread (below forkelem, I use single-threaded qsort). > * Pros: programs can optimise nthreads according to the workload > characteristics. > * Cons: Libc routines are generaly expected to Do he Right Thing, > without handholding; the benefits of tuning are unlikely to be dramatic. > > - Option 2.4: Have the library tune itsself for a given system or > individual programs by measuring different values > * Pros: higher performance when the tuned values are reused > * Cons: Lack of prior art; performance drop when determining tuning > values; security considerations if values are cached; complexity > > (Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal) > > My proposal: go for 2.1, based on the way C library routines work. > > 3. Determining the number of threads > ------------------------------------ > - Option 3.1: NPROCS environment variable > * Pros: prior art: CrayOS, BSD/OS; users can tune it. > * Cons: May be too blunt; requires setting by user or administrator > > - Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu() > * Pros: automatic, right for the underlying hardware > * Cons: even blunter > > - Option 3.5: Add library interface for the above as int nthreads_mt() > * Pros: we abstract policy; programs can replace it; reusability; > extensibility. > * Cons: namespace polution; it is not clear that a single number is > correct for all uses of a program. > > (All the above options are orthogonal) > > My proposal: Add library interface using NPROCS and falling back to a > sysctl variable. This interface can be extended in the future. > > Performance figures > ------------------- > Reported times are elapsed, user, and system seconds. > > $ uname -a > FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar > 14 14:25:07 UTC 2007 > root@sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE amd64 > > $ qsort_mt -? > qsort_mt: option requires an argument -- h > usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements] > -l Run the libc version of qsort > -s Test with 20-byte strings, instead of integers > -t Print timing results > -v Verify the integer results > Defaults are 1e7 elements, 2 threads, 100 fork elements > > > $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc > > $ qsort_mt -t -h 2 -n 100000000 -l # Integers; qsort(3) > 46.5 46.2 0.314 > $ ./qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt > 27.5 46.3 0.301 > $ qsort_mt -t -h 2 -n 10000000 -s -l # Strings; qsort(3) > 41.3 40.1 1.2 > $ ./qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt > 26.7 40.1 1.16 > > $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc > $ qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt > 28 47.1 0.368 > $ qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt > 27.1 40.6 1.06 > > > Thanks, > > Diomidis - dds@ > http://www.spinellis.gr -- mezz7@cox.net - mezz@FreeBSD.org FreeBSD GNOME Team - FreeBSD Multimedia Hat (ports, not src) http://www.FreeBSD.org/gnome/ - gnome@FreeBSD.org http://wiki.freebsd.org/multimedia - multimedia@FreeBSD.org