From owner-freebsd-bugs@FreeBSD.ORG Mon Nov 3 00:10:14 2003 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F73016A4CE for ; Mon, 3 Nov 2003 00:10:14 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 08CF143FE3 for ; Mon, 3 Nov 2003 00:10:13 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.9/8.12.9) with ESMTP id hA38ACFY067208 for ; Mon, 3 Nov 2003 00:10:12 -0800 (PST) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.9/8.12.9/Submit) id hA38ACwv067207; Mon, 3 Nov 2003 00:10:12 -0800 (PST) (envelope-from gnats) Resent-Date: Mon, 3 Nov 2003 00:10:12 -0800 (PST) Resent-Message-Id: <200311030810.hA38ACwv067207@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Markus Bjartveit Kruger Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C2D416A4CE for ; Mon, 3 Nov 2003 00:03:41 -0800 (PST) Received: from proto.pvv.ntnu.no (proto.pvv.ntnu.no [129.241.210.168]) by mx1.FreeBSD.org (Postfix) with SMTP id 5E93C43FCB for ; Mon, 3 Nov 2003 00:03:39 -0800 (PST) (envelope-from markusk@pvv.ntnu.no) Received: (qmail 71891 invoked by uid 28724); 3 Nov 2003 08:03:37 -0000 Message-Id: <20031103080337.71890.qmail@proto.pvv.ntnu.no> Date: 3 Nov 2003 08:03:37 -0000 From: Markus Bjartveit Kruger To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Subject: misc/58860: Improvement of radixsort() implementation X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Markus Bjartveit Kruger List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Nov 2003 08:10:14 -0000 >Number: 58860 >Category: misc >Synopsis: Improvement of radixsort() implementation >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Mon Nov 03 00:10:12 PST 2003 >Closed-Date: >Last-Modified: >Originator: Markus Bjartveit Kruger >Release: FreeBSD 4.8-RELEASE i386 >Organization: Programvareverkstedet >Environment: System: FreeBSD proto.pvv.ntnu.no 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Tue Sep 16 21:03:28 CEST 2003 root@bacchus.pvv.ntnu.no:/local/obj/usr/src/sys/PROTO i386 >Description: I've found a possible improvement to the implementation of radixsort() located in /usr/src/lib/libc/stdlib/radixsort.c, which will make the implementation more efficient when sorting data sets containing many strings that share a common prefix. The improvement consists in checking for bins where all strings have the same character at the current position of the radix sort, and moving on to the next character instead of needlessly shuffling the strings in the bin around. When I used the modified radixsort() to sort a web log file, which typically contains many strings with common prefixes (IP addresses), I registered a speedup of approximately 15%. On a constructed dataset where all strings had a long common prefix, the speedup was 50%. >How-To-Repeat: Compile a program with the old and new version of radixsort() and compare processing times. Remember to set appropriate optimization options, as the efficiency gain can be lost otherwise. >Fix: --- radixsort-patch begins here --- --- radixsort.c Tue Mar 11 12:39:57 1997 +++ radixsort.c.patch Fri Oct 31 12:55:10 2003 @@ -175,6 +175,17 @@ } /* + * Special case: if all strings have the same + * character at position i, move on to the next + * character. + */ + if (nc == 1 && count[bmin] == n) { + push(a, n, i+1); + nc = count[bmin] = 0; + continue; + } + + /* * Set top[]; push incompletely sorted bins onto stack. * top[] = pointers to last out-of-place element in bins. * count[] = counts of elements in bins. --- radixsort-patch ends here --- >Release-Note: >Audit-Trail: >Unformatted: