From owner-freebsd-bugs@FreeBSD.ORG Sun Jan 7 05:00:38 2007 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EEADC16A40F for ; Sun, 7 Jan 2007 05:00:38 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id 9DAD513C45E for ; Sun, 7 Jan 2007 05:00:38 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l0750cLZ018267 for ; Sun, 7 Jan 2007 05:00:38 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l0750cTs018266; Sun, 7 Jan 2007 05:00:38 GMT (envelope-from gnats) Date: Sun, 7 Jan 2007 05:00:38 GMT Message-Id: <200701070500.l0750cTs018266@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Julian Seward Cc: Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Julian Seward List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 05:00:39 -0000 The following reply was made to PR bin/106734; it has been noted by GNATS. From: Julian Seward To: Mikhail Teterin Cc: bug-followup@freebsd.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Date: Sun, 7 Jan 2007 05:08:43 +0000 I believe this analysis is correct: > /* Load the bytes: */ > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); > > read beyond the end of the defined area of block. block is > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], > hence loading 15 bytes of garbage. Valgrind doesn't complain about the out-of-range access, because you are still accessing inside a valid malloc-allocated block. But it does know that the read data is uninitialised, hence it complains when you do a comparison with that data followed by a conditional branch (or move) based on the result of the comparison. > This is possible... You think, the loop should exit earlier and test > the last (up to) 15 bytes one-by-one? Certainly the loop-end stuff needs to be fixed up somehow to reflect the 16 byte loads, but without further investigation I'm not sure how.