From owner-freebsd-threads@FreeBSD.ORG Thu Mar 15 12:12:13 2007 Return-Path: X-Original-To: freebsd-threads@freebsd.org Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 76E6416A402; Thu, 15 Mar 2007 12:12:13 +0000 (UTC) (envelope-from dds@aueb.gr) Received: from mx-out.forthnet.gr (mx-out.forthnet.gr [193.92.150.103]) by mx1.freebsd.org (Postfix) with ESMTP id E6F5213C44B; Thu, 15 Mar 2007 12:12:12 +0000 (UTC) (envelope-from dds@aueb.gr) Received: from mx-av-03.forthnet.gr (mx-av.forthnet.gr [193.92.150.27]) by mx-out-02.forthnet.gr (8.14.0/8.14.0) with ESMTP id l2F8gYKC012269; Thu, 15 Mar 2007 10:42:34 +0200 Received: from MX-IN-01.forthnet.gr (mx-in-01.forthnet.gr [193.92.150.23]) by mx-av-03.forthnet.gr (8.13.7/8.13.7) with ESMTP id l2F8gUEr031596; Thu, 15 Mar 2007 10:42:34 +0200 Received: from [192.168.136.22] (ppp97-184.adsl.forthnet.gr [212.54.207.184]) by MX-IN-01.forthnet.gr (8.14.0/8.14.0) with ESMTP id l2F8gPi8003629; Thu, 15 Mar 2007 10:42:25 +0200 Authentication-Results: MX-IN-01.forthnet.gr from=dds@aueb.gr; sender-id=neutral; spf=neutral Message-ID: <45F906ED.8070100@aueb.gr> Date: Thu, 15 Mar 2007 10:42:21 +0200 From: Diomidis Spinellis User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.9) Gecko/20061211 SeaMonkey/1.0.7 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, freebsd-threads@freebsd.org Content-Type: text/plain; charset=ISO-8859-7; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Multithreaded qsort(3) X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Mar 2007 12:12:13 -0000 Greetings, I've added multhread support to our qsort implementation. You can see the code at and some performance figures at the end of this email. I propose to add it to our libc. I need your opinions on the interface, and also any comments you may have on the code. I see the following design dimensions: 1. Naming and availability -------------------------- - Option 1.1: Replace qsort with this implementation * Pros: Programs gain performance without any source code change; abstraction (number of processors/cores should be invisible, just as the CPU architecture or disk drive interface) * Cons: May confuse multithreaded programs; programs require linking with pthreads library; programs whose compare function is not thread safe will break - Option 1.2: Name it qsort_mt * Pros: POLA; programs can tune the call according to their need * Cons: Programs require source code changes; we violate abstraction; namespace polution My proposal: option 1.2: name it qsort_mt and make it available in a separate library (libc_mt). 2. Interface ------------ - Option 2.1: qsort compatible * Pros: Drop in replacement; doesn't need programmer understanding; compatible with option 1.1 * Cons: Can't be tuned for a specific program or workload - Option 2.2: Add nthreads parameter, specifying the maximum number of threads * Pros: Multithreaded programs can tune load balancing; all programs can optimise nthreads according to the workload characteristics. * Cons: Libc routines are generaly expected to Do he Right Thing, without handholding; must agree on mechanism for determining the number of threads; the benefits of tuning are unlikely to be dramatic. - Option 2.3: Add forkelem, specifying minimum number of elements for a new thread (below forkelem, I use single-threaded qsort). * Pros: programs can optimise nthreads according to the workload characteristics. * Cons: Libc routines are generaly expected to Do he Right Thing, without handholding; the benefits of tuning are unlikely to be dramatic. - Option 2.4: Have the library tune itsself for a given system or individual programs by measuring different values * Pros: higher performance when the tuned values are reused * Cons: Lack of prior art; performance drop when determining tuning values; security considerations if values are cached; complexity (Options 2.2 and 2.3 are orthogonal; options 2.1 and 2.4 are orthogonal) My proposal: go for 2.1, based on the way C library routines work. 3. Determining the number of threads ------------------------------------ - Option 3.1: NPROCS environment variable * Pros: prior art: CrayOS, BSD/OS; users can tune it. * Cons: May be too blunt; requires setting by user or administrator - Option 3.[234]: hw.ncpu or kern.smp.cpus sysctl, or pmc_ncpu() * Pros: automatic, right for the underlying hardware * Cons: even blunter - Option 3.5: Add library interface for the above as int nthreads_mt() * Pros: we abstract policy; programs can replace it; reusability; extensibility. * Cons: namespace polution; it is not clear that a single number is correct for all uses of a program. (All the above options are orthogonal) My proposal: Add library interface using NPROCS and falling back to a sysctl variable. This interface can be extended in the future. Performance figures ------------------- Reported times are elapsed, user, and system seconds. $ uname -a FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #924: Wed Mar 14 14:25:07 UTC 2007 root@sledge.freebsd.org:/h/src/sys/amd64/compile/SLEDGE amd64 $ qsort_mt -? qsort_mt: option requires an argument -- h usage: qsort_mt [-stv] [-f forkelements] [-h threads] [-n elements] -l Run the libc version of qsort -s Test with 20-byte strings, instead of integers -t Print timing results -v Verify the integer results Defaults are 1e7 elements, 2 threads, 100 fork elements $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lthr -lpmc $ qsort_mt -t -h 2 -n 100000000 -l # Integers; qsort(3) 46.5 46.2 0.314 $ ./qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt 27.5 46.3 0.301 $ qsort_mt -t -h 2 -n 10000000 -s -l # Strings; qsort(3) 41.3 40.1 1.2 $ ./qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt 26.7 40.1 1.16 $ gcc -DTEST -O qsort_mt.c -o qsort_mt -lpthread -lpmc $ qsort_mt -t -h 2 -n 100000000 # Integers; qsort_mt 28 47.1 0.368 $ qsort_mt -t -h 2 -n 10000000 -s # Strings; qsort_mt 27.1 40.6 1.06 Thanks, Diomidis - dds@ http://www.spinellis.gr