Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Jun 2005 00:07:25 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-hackers@freebsd.org, John-Mark Gurney <gurney_j@resnet.uoregon.edu>
Subject:   (no subject)
Message-ID:  <200506040707.j5477Pr1064192@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
Subject: Re: Possible instruction pipelining problem between HT's on the	same
 die ?
References: <200506032057.j53KvOFw062012@apollo.backplane.com>	<20050604021812.GG594@funkthat.com> <200506040257.j542veCm063487@apollo.backplane.com> <42A11F4A.40502@samsco.org>


:I would expect that putting the fence on the write side will solve the 
:problem.  As Stephen discussed, the writes will land in a store buffer
:for a period of time, during which a fence on the write CPU will flush 
:it out and make it visible to the other CPUs.  Doing a fence on the read
:CPU will have no effect on the store buffers of the write CPU and will
:be a waste of time.

    As a way to reduce latency... but a fence on the write side does
    not solve the reordering problem on the read side.  When the write
    side writes the FIFO entry and then updates the FIFO index, the
    read side must be able to guarentee that the FIFO data it reads
    is valid when it sees that the FIFO index has been updated.  This
    means that the read side cannot afford to allow the reads to be
    reordered and thus must use some sort of fence.

:Another thing to keep in mind is that there is no difference here 
:between HT and non HT SMP protocol.  While HT cores share execution 
:units, they DO NOT share registers, store buffers, or cache (at least,
:not in a way that is visible outside of the low-level implementation of
:the chip).
:
:Scott

    They do share the cache, but I see your point.  I'm not sure about
    store buffers but from the behavior I've observed I suspect that
    store buffers either are not shared, or a logical cpu's store buffer
    sniffing does not extend to the other logical cpu's entries.

    In our case latency is not a big issue.  These are almost universally
    asynchronous messages flying between the cpus.  What matters is the cycle
    overhead on each side to send and process the message.  Hence I was
    trying to avoid the use of locked bus cycle instructions.  I'll have
    to run tests to check the relative expense of the *FENCE instructions
    (when supported) verses doing a lock; addl 0(%esp) to fence the read.
    At least I don't have to put the fence in the body of the processing
    loop... I just have to put it after the read of the FIFO's write index
    before the loop is entered.

    It's a real shame that special instructions are required at all.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200506040707.j5477Pr1064192>