Date: Sun, 7 Jan 2007 05:00:38 GMT From: Julian Seward <jseward@acm.org> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Message-ID: <200701070500.l0750cTs018266@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/106734; it has been noted by GNATS. From: Julian Seward <jseward@acm.org> To: Mikhail Teterin <mi@corbulon.video-collage.com> Cc: bug-followup@freebsd.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Date: Sun, 7 Jan 2007 05:08:43 +0000 I believe this analysis is correct: > /* Load the bytes: */ > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); > > read beyond the end of the defined area of block. block is > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], > hence loading 15 bytes of garbage. Valgrind doesn't complain about the out-of-range access, because you are still accessing inside a valid malloc-allocated block. But it does know that the read data is uninitialised, hence it complains when you do a comparison with that data followed by a conditional branch (or move) based on the result of the comparison. > This is possible... You think, the loop should exit earlier and test > the last (up to) 15 bytes one-by-one? Certainly the loop-end stuff needs to be fixed up somehow to reflect the 16 byte loads, but without further investigation I'm not sure how.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701070500.l0750cTs018266>