From owner-freebsd-hackers@FreeBSD.ORG  Fri Jun  3 20:57:25 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 280DE16A41C
	for <freebsd-hackers@freebsd.org>; Fri,  3 Jun 2005 20:57:25 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F19C243D4C
	for <freebsd-hackers@freebsd.org>; Fri,  3 Jun 2005 20:57:24 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.12.9p2/8.12.9) with ESMTP id j53KvO0e062013
	for <freebsd-hackers@freebsd.org>; Fri, 3 Jun 2005 13:57:24 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j53KvOFw062012;
	Fri, 3 Jun 2005 13:57:24 -0700 (PDT) (envelope-from dillon)
Date: Fri, 3 Jun 2005 13:57:24 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200506032057.j53KvOFw062012@apollo.backplane.com>
To: freebsd-hackers@freebsd.org
Subject: Possible instruction pipelining problem between HT's on the same
	die ?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2005 20:57:25 -0000

    I've been tracking down a crash one of our users gets occassionally.
    He has a quad Intel(R) XEON(TM) CPU 2.00GHz (1996.61-MHz 686-class CPU)
    system.

    After getting a few of these crashes he pulled three of the four cpus
    out.   But with just one physical cpu, with HTT turned on (so two
    logical cpus), he is still getting these crashes.

    This is the sequence that causes the bad data:

    cpu #0	write A
		write B

    (HT)cpu #1	read B
		if (B) 
		    read A	<----  gets OLD data in A, not new data

    Now I was depending on the presumed write ordering, so if a foreign
    cpu sees that B is updated it can assume that A has also been updated.

    But I'm beginning to think that it isn't working as advertised.  I've
    read the manuals over and over again and they seem to only guarentee
    write ordering between physical cpus, not between logical HT cpus, and
    even then it appears that a cpu can do a speculative read and
    thus get an old value for A even after getting a new value for B.

    I looked at the various SFENCE/LFENCE/MFENCE instructions and they
    do not seem to guarentee ordering for speculative accesses at all.
    They all say that they do not protect against speculative reads.
    Bus-locked instructions don't seem to avoid speculative reads either.

    I'm even more confused because this bug is occuring between two logical
    cpus on the same physical die.  Is write ordering not guarenteed with
    respect to the other logical cpu?  Can one logical cpu prefetch data
    early then then becomes obsolete by the time the instruction is actually
    run?  Or perhaps its a pipeline bug... I just don't know.  But it's
    damn annoying.

    The only solution I see is to use an actual serializing instruction
    like cpuid.  I really do not want to have to use cpuid :-(.

    So, has anyone seen anything similar?

							-Matt