Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Feb 2011 10:34:17 +0530
From:      "Jayachandran C." <c.jayachandran@gmail.com>
To:        Artem Belevich <fbsdlist@src.cx>
Cc:        freebsd-mips@freebsd.org
Subject:   Re: lib/libc/mips/string/bzero.S -- problem in 64-bit mode.
Message-ID:  <AANLkTim83G00D_xw1tyK8qyVwOWL6-_ivpt-zDOoe3-U@mail.gmail.com>
In-Reply-To: <AANLkTik2evgf4-k85P%2Bsm953ofa0=UNd7o2uWhQw7qiB@mail.gmail.com>
References:  <AANLkTik2evgf4-k85P%2Bsm953ofa0=UNd7o2uWhQw7qiB@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 22, 2011 at 6:37 AM, Artem Belevich <fbsdlist@src.cx> wrote:
> Hi,
>
> I think htere's a problem with bzero implementation for 64-bit mips (SZRE=
G=3D=3D8).
>
> http://svn.freebsd.org/viewvc/base/head/lib/libc/mips/string/bzero.S?revi=
sion=3D209231&view=3Dmarkup
>
> LEAF(bzero)
> =A0 =A0 =A0 =A0.set =A0 =A0noreorder
> =A0 =A0 =A0 =A0blt =A0 =A0 =A0 =A0 =A0 =A0 a1, 3*SZREG, smallclr # small =
amount to clear?
> =A0 =A0 =A0 =A0PTR_SUBU =A0 =A0 =A0 =A0a3, zero, a0 =A0 =A0# compute # by=
tes to word align address
> =A0 =A0 =A0 =A0and =A0 =A0 =A0 =A0 =A0 =A0 a3, a3, SZREG-1
> =A0 =A0 =A0 =A0beq =A0 =A0 =A0 =A0 =A0 =A0 a3, zero, 1f =A0 =A0# skip if =
word aligned
> #if SZREG =3D=3D 4
> =A0 =A0 =A0 =A0PTR_SUBU =A0 =A0 =A0 =A0a1, a1, a3 =A0 =A0 =A0# subtract f=
rom remaining count
> =A0 =A0 =A0 =A0SWHI =A0 =A0 =A0 =A0 =A0 =A0zero, 0(a0) =A0 =A0 # clear 1,=
 2, or 3 bytes to align
> =A0 =A0 =A0 =A0PTR_ADDU =A0 =A0 =A0 =A0a0, a0, a3
> #endif
>
> #if SZREG =3D=3D 8
> =A0 =A0 =A0 =A0PTR_SUBU =A0 =A0 =A0 =A0a1, a1, a3 =A0 =A0 =A0# subtract f=
rom remaining count
> =A0 =A0 =A0 =A0PTR_ADDU =A0 =A0 =A0 =A0a0, a0, a3 =A0 =A0 =A0# align dst =
to next word
> =A0 =A0 =A0 =A0sll =A0 =A0 =A0 =A0 =A0 =A0 a3, a3, 3 =A0 =A0 =A0 # bits t=
o bytes
> =A0 =A0 =A0 =A0li =A0 =A0 =A0 =A0 =A0 =A0 =A0a2, -1 =A0 =A0 =A0 =A0 =A0# =
make a mask
> #if _BYTE_ORDER =3D=3D _BIG_ENDIAN
> (a) =A0 =A0 REG_SRLV =A0 =A0 =A0 =A0a2, a2, a3 =A0 =A0 =A0# we want to ke=
ep the MSB bytes
> #endif
> #if _BYTE_ORDER =3D=3D _LITTLE_ENDIAN
> (b) =A0 =A0 REG_SLLV =A0 =A0 =A0 =A0a2, a2, a3 =A0 =A0 =A0# we want to ke=
ep the LSB bytes
> #endif
> (c) =A0 =A0 nor =A0 =A0 =A0 =A0 =A0 =A0 a2, zero, a2 =A0 =A0# complement =
the mask
> =A0 =A0 =A0 =A0REG_L =A0 =A0 =A0 =A0 =A0 v0, -SZREG(a0) =A0# load the wor=
d to partially clear
> =A0 =A0 =A0 =A0and =A0 =A0 =A0 =A0 =A0 =A0 v0, v0, a2 =A0 =A0 =A0# clear =
the bytes
> =A0 =A0 =A0 =A0REG_S =A0 =A0 =A0 =A0 =A0 v0, -SZREG(a0) =A0# store it bac=
k
> #endif
>
> Let's suppose we're trying to bzero something at 0x1234567. =A0A3 will
> contain number of bytes *remaining* until register-aligned address.
> I.e. 1 in this case.
> When we make it to (c) on big-endian platforms A2=3D0x00FFFFFF_FFFFFFFF
> and on little-endianA2=3D0xFFFFFFFF_FFFFFF00.
> after (c) it's 0xFF000000_00000000 and 0x00000000_000000FF
> correspondingly, unless I've got NOR semantics wrong.
>
> Now we load register, AND it with a2 and write the result back. It
> does not look right -- we're clearing *7* bytes instead of only one
> and clobber the data that preceeds the start address.
>
> I believe correct code should look like this:
>
> #if SZREG =3D=3D 8
> =A0 =A0 =A0 =A0PTR_SUBU =A0 =A0 =A0 =A0a1, a1, a3 =A0 =A0 =A0# subtract f=
rom remaining count
> =A0 =A0 =A0 =A0PTR_ADDU =A0 =A0 =A0 =A0a0, a0, a3 =A0 =A0 =A0# align dst =
to next word
> =A0 =A0 =A0 =A0sll =A0 =A0 =A0 =A0 =A0 =A0 a3, a3, 3 =A0 =A0 =A0 # bits t=
o bytes
> =A0 =A0 =A0 =A0li =A0 =A0 =A0 =A0 =A0 =A0 =A0a2, -1 =A0 =A0 =A0 =A0 =A0# =
make a mask
> #if _BYTE_ORDER =3D=3D _BIG_ENDIAN
> =A0 =A0 =A0 =A0REG_SLLV =A0 =A0 =A0 =A0a2, a2, a3 =A0 =A0 =A0# we want to=
 keep the MSB bytes
> #endif
> #if _BYTE_ORDER =3D=3D _LITTLE_ENDIAN
> =A0 =A0 =A0 =A0REG_SRLV =A0 =A0 =A0 =A0a2, a2, a3 =A0 =A0 =A0# we want to=
 keep the LSB bytes
> #endif
> =A0 =A0 =A0 =A0REG_L =A0 =A0 =A0 =A0 =A0 v0, -SZREG(a0) =A0# load the wor=
d to partially clear
> =A0 =A0 =A0 =A0and =A0 =A0 =A0 =A0 =A0 =A0 v0, v0, a2 =A0 =A0 =A0# clear =
the bytes
> =A0 =A0 =A0 =A0REG_S =A0 =A0 =A0 =A0 =A0 v0, -SZREG(a0) =A0# store it bac=
k
> #endif
>
> One thing I don't quite understand is -- why do we bother with all
> this manual masking at all? Why not just use REG_SHI for both SZREG=3D=3D=
4
> and SZREG=3D=3D8 cases? If I read the code I quoted above correctly, it
> attempts to emulate SDL/SDR instructions. Using REG_SHI macro would
> pick correct swl/swr/sdl/sdr variant based on register size and
> endianness and would clear the unaligned bytes.
>
> =A0 =A0 =A0 =A0PTR_SUBU =A0 =A0 =A0 =A0a1, a1, a3 =A0 =A0 =A0# subtract f=
rom remaining count
> =A0 =A0 =A0 =A0REG_SHI zero, 0(a0) =A0 =A0 # clear 1..SZREG-1 bytes to al=
ign
> =A0 =A0 =A0 =A0PTR_ADDU =A0 =A0 =A0 =A0a0, a0, a3

I just tested this with a simple program - and there is certainly an
issue here.  If you can send me a patch, I can check that in after
testing.

The kernel version of bzero() does not seem to have the SZREG=3D=3D8 case,
and this bug.

Thanks,
JC.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTim83G00D_xw1tyK8qyVwOWL6-_ivpt-zDOoe3-U>