Skip site navigation (1)Skip section navigation (2)
Date:      3 Nov 2003 08:03:37 -0000
From:      Markus Bjartveit Kruger <markusk@pvv.ntnu.no>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   misc/58860: Improvement of radixsort() implementation
Message-ID:  <20031103080337.71890.qmail@proto.pvv.ntnu.no>
Resent-Message-ID: <200311030810.hA38ACwv067207@freefall.freebsd.org>

index | next in thread | raw e-mail


>Number:         58860
>Category:       misc
>Synopsis:       Improvement of radixsort() implementation
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Nov 03 00:10:12 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Markus Bjartveit Kruger
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
Programvareverkstedet
>Environment:
System: FreeBSD proto.pvv.ntnu.no 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Tue Sep 16 21:03:28 CEST 2003 root@bacchus.pvv.ntnu.no:/local/obj/usr/src/sys/PROTO i386


	
>Description:
	
I've found a possible improvement to the implementation of radixsort()
located in /usr/src/lib/libc/stdlib/radixsort.c, which will make the
implementation more efficient when sorting data sets containing many
strings that share a common prefix.

The improvement consists in checking for bins where all strings have
the same character at the current position of the radix sort, and
moving on to the next character instead of needlessly shuffling the
strings in the bin around.  When I used the modified radixsort() to
sort a web log file, which typically contains many strings with common
prefixes (IP addresses), I registered a speedup of approximately 15%.
On a constructed dataset where all strings had a long common prefix,
the speedup was 50%.
>How-To-Repeat:
	
Compile a program with the old and new version of radixsort() and compare
processing times.  Remember to set appropriate optimization options, as
the efficiency gain can be lost otherwise.
>Fix:

	

--- radixsort-patch begins here ---
--- radixsort.c	Tue Mar 11 12:39:57 1997
+++ radixsort.c.patch	Fri Oct 31 12:55:10 2003
@@ -175,6 +175,17 @@
 		}
 
 		/*
+		 * Special case: if all strings have the same
+		 * character at position i, move on to the next
+		 * character.
+		 */
+		if (nc == 1 && count[bmin] == n) {
+			push(a, n, i+1);
+			nc = count[bmin] = 0;
+			continue;
+		}
+
+		/*
 		 * Set top[]; push incompletely sorted bins onto stack.
 		 * top[] = pointers to last out-of-place element in bins.
 		 * count[] = counts of elements in bins.
--- radixsort-patch ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:


help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031103080337.71890.qmail>