From owner-freebsd-smp  Sat Dec 14 09:26:12 1996
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id JAA17392
          for smp-outgoing; Sat, 14 Dec 1996 09:26:12 -0800 (PST)
Received: from ormail.intel.com (ormail.intel.com [134.134.248.3])
          by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id JAA17384
          for <smp@freebsd.org>; Sat, 14 Dec 1996 09:26:08 -0800 (PST)
From: haertel@ichips.intel.com
Received: from ichips.intel.com (ichips.intel.com [134.134.50.200]) by ormail.intel.com (8.8.4/8.7.3) with ESMTP id JAA12513; Sat, 14 Dec 1996 09:25:48 -0800 (PST)
Received: from pdxcs078.intel.com by ichips.intel.com (8.7.4/jIII)
	id JAA28180; Sat, 14 Dec 1996 09:23:01 -0800 (PST)
Received: by pdxcs078.intel.com (AIX 3.2/UCB 5.64/SW1.11) 
	id AA57406; Sat, 14 Dec 1996 09:25:51 -0800
Date: Sat, 14 Dec 1996 09:25:51 -0800
Message-Id: <9612141725.AA57406@pdxcs078.intel.com>
To: peter@spinner.dialix.com
Subject: Re: some questions concerning TLB shootdowns in FreeBSD
Cc: dg@root.com, smp@freebsd.org
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>I'm still digesting it,  I am almost worried that we might (shudder!)
>be forced into doing an IPI to stop all the cpu's *before* the
>current cpu changes the page tables, then letting them do the tlb
>flush and letting them proceed.  If this actually is a real problem
>this means a much bigger code impact.

You must do precisely this.

The x86 architecture includes some complex instructions that
reference the same memory locations more than once--read-modify-write
sequences are the most obvious example.  For various reasons,
there is no guarantee that the TLB entries associated with those
memory locations are locked in the TLB, and so they might be
thrashed out due to other activity while those complex instructions
are executing.  If, in the meantime, some other processor
has manipulated the associated PTE in any way that lowers privilege
or changes the mapping, this processor could get a page fault
in a *non restartable* way, since it would see the mapping and/or
privilege changing under foot, but have already committed to
finishing the instruction (since the privilege checks are
normally only done at the beginning of the instruction).

As for your other question: speculative execution does not
continue past an interrupt.  An interrupt is a totally
serializing event.  However, once you're in the interrupt
handler, speculative execution could go down a different path
than you think of the interrupt as actually taking.  Basically
every time the processor fetches something from the Icache that
it thinks *might* contain a branch, it is an opportunity for the
processor to go off into la-la land, since it will simply ask
the branch predictor what it thinks and go that way.

The effect of this is speculative pollution of the non-renamed
state of the processor like the cache and the TLB entries.  So,
for example, in the uniprocessor case, doing this:

	1.  flush TLB
	2.  manipulate PTE

is not safe, since after (1), the processor may waltz
speculatively off to some code that actually references the PTE
before you manipulate it.  Instead you must always:

	1.  Manipulate PTE
	2.  flush TLB

On multiprocessors, there is the additional concern of corrupting
state which must remain invariant during instruction execution on
other processors.  So then you need the fully bulletproof code:

	1.  IPI to everyone sharing these specific PTE's
	2.  wait at barrier until everyone arrives
	3.  manipulate PTE
	4.  release barrier
	5.  everyone (including us) flushes TLB's

Bleah, I know.