Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Mar 2007 10:42:21 +0200
From:      Diomidis Spinellis <dds@aueb.gr>
To:        freebsd-arch@freebsd.org, freebsd-threads@freebsd.org
Subject:   Multithreaded qsort(3)
Message-ID:  <45F906ED.8070100@aueb.gr>

next in thread | raw e-mail | index | archive | help
Greetings,

I've added multhread support to our qsort implementation.  You can see 
the code at <http://people.freebsd.org/~dds/qsort_mt.c>; and some 
performance figures at the end of this email.  I propose to add it to 
our libc.  I need your opinions on the interface, and also any comments 
you may have on the code.  I see the following design dimensions:

1. Naming and availability
--------------------------
- Option 1.1: Replace qsort with this implementation
* Pros: Programs gain performance without any source code change; 
abstraction (number of processors/cores should be invisible, just as the 
CPU architecture or disk drive interface)
* Cons: May confuse multithreaded programs; programs require linking 
with pthreads library; programs whose compare function is not thread 
safe will break

- Option 1.2: Name it qsort_mt
* Pros: POLA; programs can tune the call according to their need
* Cons: Programs require source code changes; we violate abstraction; 
namespace polution

My proposal: option 1.2: name it qsort_mt and make it available in a 
separate library (libc_mt).


2. Interface
------------
- Option 2.1: qsort compatible
* Pros: Drop in replacement; doesn't need programmer understanding; 
compatible with option 1.1
* Cons: Can't be tuned for a specific program or workload

- Option 2.2: Add nthreads parameter, specifying the maximum number of 
threads
* Pros: Multithreaded programs can tune load balancing; all programs can 
optimise nthreads according to the workload characteristics.
* Cons: Libc routines are generaly expected to Do he Right Thing, 
without handholding; must agree on mechanism for determining the number 
of threads; the benefits of tuning are unlikely to be dramatic.

- Option 2.3: Add forkelem, specifying minimum number of elements for a 
new thread (below forkelem, I use single-threaded qsort).
* Pros: programs can optimise nthreads according to the workload 
characteristics.
* Cons: Libc routines are generaly expected to Do he Right Thing, 
without handholding; the benefits of tuning are unlikely to be dramatic.

- Option 2.4: Have the library tune itsself for a given system or 
individual programs by measuring different values
* Pros: higher performance when the tuned values are reused
* Cons: Lack of prior art; performance drop when determining tuning 
values; security considerations if values are cached; complexity

(Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal)

My proposal: go for 2.1, based on the way C library routines work.

3. Determining the number of threads
------------------------------------
- Option 3.1: NPROCS environment variable
* Pros: prior art: CrayOS, BSD/OS; users can tune it.
* Cons: May be too blunt; requires setting by user or administrator

- Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu()
* Pros: automatic, right for the underlying hardware
* Cons: even blunter

- Option 3.5: Add library interface for the above as int nthreads_mt()
* Pros: we abstract policy; programs can replace it; reusability; 
extensibility.
* Cons: namespace polution; it is not clear that a single number is 
correct for all uses of a program.

(All the above options are orthogonal)

My proposal: Add library interface using NPROCS and falling back to a 
sysctl variable.  This interface can be extended in the future.

Performance figures
-------------------
Reported times are elapsed, user, and system seconds.

$ uname -a
FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar 
14 14:25:07 UTC 2007 
root@sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE  amd64

$ qsort_mt -?
qsort_mt: option requires an argument -- h
usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements]
         -l      Run the libc version of qsort
         -s      Test with 20-byte strings, instead of integers
         -t      Print timing results
         -v      Verify the integer results
Defaults are 1e7 elements, 2 threads, 100 fork elements


$ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc

$ qsort_mt -t -h 2 -n 100000000 -l	# Integers; qsort(3)
46.5 46.2 0.314
$ ./qsort_mt  -t -h 2 -n 100000000	# Integers; qsort_mt
27.5 46.3 0.301
$ qsort_mt -t -h 2 -n 10000000 -s -l	# Strings; qsort(3)
41.3 40.1 1.2
$ ./qsort_mt  -t -h 2 -n 10000000 -s	# Strings; qsort_mt
26.7 40.1 1.16

$ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc
$ qsort_mt -t -h 2 -n 100000000		# Integers; qsort_mt
28 47.1 0.368
$ qsort_mt -t -h 2 -n 10000000 -s	# Strings; qsort_mt
27.1 40.6 1.06


Thanks,

Diomidis - dds@
http://www.spinellis.gr



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45F906ED.8070100>