From owner-freebsd-bugs@FreeBSD.ORG Thu Jun 18 20:30:04 2009 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEF721065679 for ; Thu, 18 Jun 2009 20:30:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C81FD8FC26 for ; Thu, 18 Jun 2009 20:30:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5IKU4pV065188 for ; Thu, 18 Jun 2009 20:30:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5IKU4HE065187; Thu, 18 Jun 2009 20:30:04 GMT (envelope-from gnats) Resent-Date: Thu, 18 Jun 2009 20:30:04 GMT Resent-Message-Id: <200906182030.n5IKU4HE065187@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Jay Patrick Howard Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B913A10656CC for ; Thu, 18 Jun 2009 20:29:31 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id A52C28FC1B for ; Thu, 18 Jun 2009 20:29:31 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n5IKTVdV007121 for ; Thu, 18 Jun 2009 20:29:31 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n5IKTVef007120; Thu, 18 Jun 2009 20:29:31 GMT (envelope-from nobody) Message-Id: <200906182029.n5IKTVef007120@www.freebsd.org> Date: Thu, 18 Jun 2009 20:29:31 GMT From: Jay Patrick Howard To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/135718: [PATCH] enhance qsort to properly handle 32-bit aligned data on 64-bit systems X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 20:30:06 -0000 >Number: 135718 >Category: kern >Synopsis: [PATCH] enhance qsort to properly handle 32-bit aligned data on 64-bit systems >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Jun 18 20:30:04 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Jay Patrick Howard >Release: n/a >Organization: >Environment: >Description: The stdlib qsort() code in FreeBSD is largely taken from the paper "Engineering a Sort Function" by Bentley & McIlroy (1993). In that paper, the authors employ a clever a scheme for selecting a method of swapping elements. Basically it goes like this: 1. If the base pointer or element size is not aligned to sizeof(long) then swap byte-by-byte, else 2. If the element size is exactly sizeof(long) then perform a single inline swap, else 3. perform a long-by-long swap. The implicit assumption here is that if the base pointer or element size isn't aligned to sizeof(long) then one can't do any better than a char-by-char swap. While this seems to be true on most 32-bit systems, it is not the case on at least some 64-bit systems. In particular, x86-64. Consequently, sorting data that is 32-bit aligned (but not 64-bit aligned) is much slower on 64-bit systems compared to 32-bit systems. This is because in 32-bit mode the qsort() logic uses a long-by-long swap (since the data is aligned) while in 64-bit mode qsort() drops down to a char-by-char swap. It is true that most workloads on 64-bit systems will be 64-bit aligned. However, it is fairly common for 64-bit code to need to process binary data that wasn't generated on a 64-bit system and hence may not be 64-bit aligned. Because this is fairly common it could be a decent "win" for qsort() to support fast swapping for such 32-bit aligned workloads. In my testing, sorting 64 MB worth of 100-byte records (each with a 12-byte key) took 1.8x as long on a 64-bit system as it did on a 32-bit system, with identical hardware. With a patched qsort the performance was identical on both 32-bit and 64-bit versions of the code. The patch is written such that if sizeof(long) == sizeof(int) then it acts exactly like the current version. The additional swap behavior is only enabled when sizeof(long) > sizeof(int). The extra overhead from the sizeof(int) alignment check was negligible. Given the way SWAPINIT is structured, there is no additional overhead whatsoever when the data is 64-bit aligned. The only time additional overhead is incurred is when data is NOT 64-bit aligned, in which case the extra alignment check quite is likely to provide a significant speedup. >How-To-Repeat: Sort records of size 8*n+4 bytes from within 64-bit code. Then sort the same data from within 32-bit code. The 64-bit version should take approximately 1.8x as long to execute. >Fix: See attached patch modifying /src/lib/libc/stdlib/qsort.c Patch attached with submission follows: --- qsort.c 2009-06-18 13:32:58.000000000 -0500 +++ qsort.c.patched 2009-06-18 14:22:02.000000000 -0500 @@ -34,6 +34,7 @@ __FBSDID("$FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.15 2008/01/14 09:21:34 das Exp $"); #include +#include #ifdef I_AM_QSORT_R typedef int cmp_t(void *, const void *, const void *); @@ -59,8 +60,15 @@ } while (--i > 0); \ } -#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)0) % sizeof(long) || \ +#if LONG_BIT > WORD_BIT +#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)0) % sizeof(long) || \ + es % sizeof(long) ? ((char *)a - (char *)0) % sizeof(int) || es % \ + sizeof(int) ? 3 : 2 : es == sizeof(long)? 0 : 1; +#else +#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)0) % sizeof(long) || \ es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1; +#endif + static inline void swapfunc(a, b, n, swaptype) @@ -69,6 +77,10 @@ { if(swaptype <= 1) swapcode(long, a, b, n) +#if LONG_BIT > WORD_BIT + else if(swaptype <= 2) + swapcode(int, a, b, n) +#endif else swapcode(char, a, b, n) } >Release-Note: >Audit-Trail: >Unformatted: