Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Oct 2001 09:44:23 +0100 (BST)
From:      Doug Rabson <dfr@nlsystems.com>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        Peter Wemm <peter@wemm.org>, <ia64@FreeBSD.ORG>
Subject:   Re: Hazards [was: Re: cvs commit: src/sys/ia64/ia64 sal.c]
Message-ID:  <20011022094201.L549-100000@salmon.nlsystems.com>
In-Reply-To: <20011021212935.C28459@dhcp01.pn.xcllnt.net>

index | next in thread | previous in thread | raw e-mail

On Sun, 21 Oct 2001, Marcel Moolenaar wrote:

> On Sun, Oct 21, 2001 at 02:34:35PM -0700, Peter Wemm wrote:
> >
> > 52: 3:      tbit.nz p6,p0=in0,0 ;;
> > 53: (p6)    st1     [in0]=r0,1
> > 54: (p6)    add     in1=-1,in1
> > 55:
> > 56:         tbit.nz p6,p0=in0,1 ;;
> > 57: (p6)    st2     [in0]=r0,2
> > 58: (p6)    add     in1=-2,in1
> > 59:
> > 60:         tbit.nz p6,p0=in0,2 ;;
> > 61: (p6)    st4     [in0]=r0,4
> > 62: (p6)    add     in1=-4,in1
> > 63:
> > 64:        ;;
>
> [snip]
>
> > but that hardly seems efficient.  could we copy in0 to somewhere else in
> > order to avoid the RAW?  the bits we're interested in are not going to change
> > by the st1/2/4 adds.
>
> The code is inherently sequential in that the result of the
> postinc is used by subsequent tbit instructions. One way to
> increase ILP is to do an aligned ld8, zero-out the bytes
> that need to be zeroed in the temporary register and write
> the result back. in0 (ptr) and in1 (size) can be updated
> without there being an immediate use for them. The code
> will be endianness sensitive though. Something like:
>
> 	and	t0 = 0xf8, in0;;	// sign-extension
> 	ld8	t1 = [t0];;
> 	// Zero-out the bytes in t1 that need zeroed
> 	st8	[t0] = t1
>
> in0 can be updated by a simple add:
>
> 	add	in0 = 8, t0
>
> in1 can be updated by the following sequence:
>
> 	or	t2 = 7, in0
> 	mov	t3 = in1 ;;
> 	sub	in1 = t3, t2
>
> Both updates can be performed concurrently with the zeroing
> of t1. The zeroing of t1 can be sequence of predicated dep
> instructions.
>
> Just a thought,

I'm not too worried about performance here - this is just cleaning up the
pointer so that we can do an aligned store in the main loop. I'm just
going to add the stops as Peter suggested. We can revisit this (and all
the other string code) and work on performance later. The whole lot
probably needs rewriting. Perhaps Intel has some sample code...

-- 
Doug Rabson				Mail:  dfr@nlsystems.com
					Phone: +44 20 8348 6160



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ia64" in the body of the message



help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011022094201.L549-100000>