Date: Thu, 15 Mar 2007 10:42:21 +0200 From: Diomidis Spinellis <dds@aueb.gr> To: freebsd-arch@freebsd.org, freebsd-threads@freebsd.org Subject: Multithreaded qsort(3) Message-ID: <45F906ED.8070100@aueb.gr>
next in thread | raw e-mail | index | archive | help
Greetings, I've added multhread support to our qsort implementation. You can see the code at <http://people.freebsd.org/~dds/qsort_mt.c> and some performance figures at the end of this email. I propose to add it to our libc. I need your opinions on the interface, and also any comments you may have on the code. I see the following design dimensions: 1. Naming and availability -------------------------- - Option 1.1: Replace qsort with this implementation * Pros: Programs gain performance without any source code change; abstraction (number of processors/cores should be invisible, just as the CPU architecture or disk drive interface) * Cons: May confuse multithreaded programs; programs require linking with pthreads library; programs whose compare function is not thread safe will break - Option 1.2: Name it qsort_mt * Pros: POLA; programs can tune the call according to their need * Cons: Programs require source code changes; we violate abstraction; namespace polution My proposal: option 1.2: name it qsort_mt and make it available in a separate library (libc_mt). 2. Interface ------------ - Option 2.1: qsort compatible * Pros: Drop in replacement; doesn't need programmer understanding; compatible with option 1.1 * Cons: Can't be tuned for a specific program or workload - Option 2.2: Add nthreads parameter, specifying the maximum number of threads * Pros: Multithreaded programs can tune load balancing; all programs can optimise nthreads according to the workload characteristics. * Cons: Libc routines are generaly expected to Do he Right Thing, without handholding; must agree on mechanism for determining the number of threads; the benefits of tuning are unlikely to be dramatic. - Option 2.3: Add forkelem, specifying minimum number of elements for a new thread (below forkelem, I use single-threaded qsort). * Pros: programs can optimise nthreads according to the workload characteristics. * Cons: Libc routines are generaly expected to Do he Right Thing, without handholding; the benefits of tuning are unlikely to be dramatic. - Option 2.4: Have the library tune itsself for a given system or individual programs by measuring different values * Pros: higher performance when the tuned values are reused * Cons: Lack of prior art; performance drop when determining tuning values; security considerations if values are cached; complexity (Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal) My proposal: go for 2.1, based on the way C library routines work. 3. Determining the number of threads ------------------------------------ - Option 3.1: NPROCS environment variable * Pros: prior art: CrayOS, BSD/OS; users can tune it. * Cons: May be too blunt; requires setting by user or administrator - Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu() * Pros: automatic, right for the underlying hardware * Cons: even blunter - Option 3.5: Add library interface for the above as int nthreads_mt() * Pros: we abstract policy; programs can replace it; reusability; extensibility. * Cons: namespace polution; it is not clear that a single number is correct for all uses of a program. (All the above options are orthogonal) My proposal: Add library interface using NPROCS and falling back to a sysctl variable. This interface can be extended in the future. Performance figures ------------------- Reported times are elapsed, user, and system seconds. $ uname -a FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar 14 14:25:07 UTC 2007 root@sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE amd64 $ qsort_mt -? qsort_mt: option requires an argument -- h usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements] -l Run the libc version of qsort -s Test with 20-byte strings, instead of integers -t Print timing results -v Verify the integer results Defaults are 1e7 elements, 2 threads, 100 fork elements $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc $ qsort_mt -t -h 2 -n 100000000 -l # Integers; qsort(3) 46.5 46.2 0.314 $ ./qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt 27.5 46.3 0.301 $ qsort_mt -t -h 2 -n 10000000 -s -l # Strings; qsort(3) 41.3 40.1 1.2 $ ./qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt 26.7 40.1 1.16 $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc $ qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt 28 47.1 0.368 $ qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt 27.1 40.6 1.06 Thanks, Diomidis - dds@ http://www.spinellis.gr
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45F906ED.8070100>