From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 3 22:47:27 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4F83B16A41C for ; Fri, 3 Jun 2005 22:47:27 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0751943D1D for ; Fri, 3 Jun 2005 22:47:26 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9p2/8.12.9) with ESMTP id j53MlQ0e062507; Fri, 3 Jun 2005 15:47:26 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j53MlQBg062506; Fri, 3 Jun 2005 15:47:26 -0700 (PDT) (envelope-from dillon) Date: Fri, 3 Jun 2005 15:47:26 -0700 (PDT) From: Matthew Dillon Message-Id: <200506032247.j53MlQBg062506@apollo.backplane.com> To: Stephan Uphoff References: <200506032057.j53KvOFw062012@apollo.backplane.com> <1117835598.27369.12036.camel@palm> Cc: freebsd-hackers@freebsd.org Subject: Re: Possible instruction pipelining problem between HT's on the same die ? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2005 22:47:27 -0000 :This is normal behaviour. :Take a look at IA-32 Intel Developers ... Vol 3, :Section: 7.2.2 for details + solutions. : :Stephan Ok.. that section seems to indicate that speculative reads can pass writes, but it also says that the pipeline sniffs the address within the processor and ensures proper ordering. The latter part makes sense within the context of a single cpu, but the big question is: Is that supposed to hold true for interactions with HT cpus (that share the pipeline) as well? Or not ? It seems not. Speculative reads creating out of order situations seems to be the biggest issue. The AMD manual (Programmers manual volume 3 page 186, MFENCE instruction) says this: "The MFENCE instruction is weakly-ordered with respect to data and instruction prefetches. Speculative loads initiated by the processor, or specified explicitly using cache-prefetch instructions, can be reordered around an MFENCE". This seems to be different then what the Intel manual says, and doesn't make much sense. What's the point of having a fence instruction if it can't guarentee read/write ordering? Is the AMD manual simply wrong ? Other then that, the Intel manual does indicate that speculative reads will not pass locked bus cycle instructions (the AMD manual says nothing about that that I can see). So, presumably, doing a dummy locked bus cycle operation on e.g. the top of the stack, such as Linux does, would be sufficient to ensure read ordering. Would you concur with that assessment? What's really horrible here is that the 'old' value of the data being used is modified at location A something like 30 instructions prior to the instruction that updates the index (B). I think this is a situation that can only occur in an HT configuration, and then only if the speculative read issued by the HT cpu is being held for across 30 instructions executed by the primary cpu before the HT cpu issues the read of B. cpu #0 cpu #1 (HT cpu on same die as cpu #0) speculatively read A write A (stalled) [30 instructions] (stalled x 30) write B (stalled) read B see that B has been updated read A (get old value for A instead of new) Is that even possible ? Not only the 30 instruction latency, but also the fact that even with the shared pipeline you have a speculative read on the HT cpu surviving 30 instructions running on cpu #0 (but only one or two on the HT cpu)... even though they share the same pipeline. -Matt Matthew Dillon