From owner-freebsd-bugs@FreeBSD.ORG Wed Jan 10 00:00:34 2007 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 98C6E16A407 for ; Wed, 10 Jan 2007 00:00:34 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id 556C513C459 for ; Wed, 10 Jan 2007 00:00:34 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l0A00YEf097999 for ; Wed, 10 Jan 2007 00:00:34 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l0A00Y9S097998; Wed, 10 Jan 2007 00:00:34 GMT (envelope-from gnats) Date: Wed, 10 Jan 2007 00:00:34 GMT Message-Id: <200701100000.l0A00Y9S097998@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Mikhail Teterin Cc: Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mikhail Teterin List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jan 2007 00:00:34 -0000 The following reply was made to PR bin/106734; it has been noted by GNATS. From: Mikhail Teterin To: Julian Seward Cc: bug-followup@freebsd.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Date: Tue, 9 Jan 2007 18:34:36 -0500 On Sunday 07 January 2007 00:08, Julian Seward wrote: = > /* Load the bytes: */ = > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); = > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); = > read beyond the end of the defined area of block. block is = > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think = > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], = > hence loading 15 bytes of garbage. I don't think, that's quite right... Instead of processing 8 bytes at a time, as the non-SSE code is doing, I'm comparing 16 at a time. Thus it is possible for me to be over by exactly 8 sometimes... Anyway, the problem was stemming from my bumping i1 and i2 by 16 instead of 8 after the _initial check_ (which, in the quadrant-less case should not need to be separate at all, actually). Sometimes _that_ would bring them over... I think, the solution is to either bump up BZ_N_OVERSHOOT even further or check and adjust i1 and i2: if (i1 >= nblock) i1 -= nblock; if (i2 >= nblock) i2 -= nblock; at the beginning, rather than the end of the loop. Having done that, I no longer peek beyond the end of the block (according to gdb's conditional breakpoints, at least). Please, check the new http://aldan.algebra.com/~mi/bz/blocksort-SSE2-patch-2 Yours, -mi P.S. The following gdb-script is what I used. Run as: gdb -x x.txt bzip2 x.txt: break blocksort.c:516 cond 1 (i1 > nblock) || (i2 > nblock) run -9 < /tmp/PLIST > /dev/null andjust the compression level, the input's location, and be sure to have blocksort.o compiled with debug information, of course...