Date: Sun, 7 Jan 2007 05:00:38 GMT From: Julian Seward <jseward@acm.org> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Message-ID: <200701070500.l0750cTs018266@freefall.freebsd.org>
index | next in thread | raw e-mail
The following reply was made to PR bin/106734; it has been noted by GNATS. From: Julian Seward <jseward@acm.org> To: Mikhail Teterin <mi@corbulon.video-collage.com> Cc: bug-followup@freebsd.org Subject: Re: bin/106734: [patch] SSE2 optimization for bzip2/libbz2 Date: Sun, 7 Jan 2007 05:08:43 +0000 I believe this analysis is correct: > /* Load the bytes: */ > n1 = (__m128i)_mm_loadu_pd((double *)(block + i1)); > n2 = (__m128i)_mm_loadu_pd((double *)(block + i2)); > > read beyond the end of the defined area of block. block is > defined for [0 .. nblock + BZ_N_OVERSHOOT - 1], but I think > you are doing a SSE load at &block[nblock + BZ_N_OVERSHOOT - 2], > hence loading 15 bytes of garbage. Valgrind doesn't complain about the out-of-range access, because you are still accessing inside a valid malloc-allocated block. But it does know that the read data is uninitialised, hence it complains when you do a comparison with that data followed by a conditional branch (or move) based on the result of the comparison. > This is possible... You think, the loop should exit earlier and test > the last (up to) 15 bytes one-by-one? Certainly the loop-end stuff needs to be fixed up somehow to reflect the 16 byte loads, but without further investigation I'm not sure how.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701070500.l0750cTs018266>
