From owner-freebsd-smp  Mon Oct 25 19:53: 8 1999
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 0BA9414E09
	for <freebsd-smp@FreeBSD.ORG>; Mon, 25 Oct 1999 19:53:04 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id TAA11014;
	Mon, 25 Oct 1999 19:52:54 -0700 (PDT)
	(envelope-from dillon)
Date: Mon, 25 Oct 1999 19:52:54 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199910260252.TAA11014@apollo.backplane.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: freebsd@gndrsh.dnsmgr.net (Rodney W. Grimes),
	tlambert@primenet.com, twinkle.star@263.net, freebsd-smp@FreeBSD.ORG
Subject: Re: inquire(second time)
References:  <199910260226.TAA26348@usr06.primenet.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:> CPU demand vs memory bandwidth is only aggrevated, and thusly
:> requireing a faster memory subsystem, by SMP.
:
:It was my understanding that this would apply to memory mapped I/O,
:as well, but of course I haven't investigated RAMBus closely enough
:(obviously) to be able to say for sure.

    Like all memory subsystems, RAMBus depends heavily on pipeline burst
    memory ops -- i.e. cache line fills and drains to obtain its performance.
    The two main features of RAMBus is that the RAMBus controller uses a
    MIPS-like command queueing interface between the cpu and the controller
    that allows memory ops to be pipelined and a special very high speed
    serial-like interface between the controller and memory.

    But this does didily for I/O operations.  If you read from an I/O location,
    whether it is memory-mapped or not, the cpu will stall badly.  The same
    goes for writing to an I/O location though in the case of writing there
    is usually a small write pipeline that allows some decoupling to occur.
    RAMBus doesn't help here.

    The only way to decouple the I/O bus from the processor completely is to 
    use a bus-master-DMA messaging interface whereby the cpu stores an I/O
    request in main memory and the I/O board DMAs the message in and then
    DMAs the response back out.  A DMA transaction involves no stalls (or,
    more specifically, the I/O board can release the bus when its pipeline
    is full to avoid stalling other memory or I/O transactions on the bus).

    Alternatively it is possible for the I/O device to have a certain amount
    of cacheable local memory which is mapped into the cpu's address space,
    allowing the cpu to access I/O requests and responses through its L1 and
    L2 caches directly (and to burst-cache-line-fill reading back the I/O
    request).  This alternative offers double the bandwidth (you avoid a
    copy) but requires a bus-snooping cache to allow the I/O device to
    cache-invalidate the area the message response is stored in.

    The former mechanisms can also theoretically use the L2 cache and not 
    even bother to flush the I/O requests to main memory, depending on the
    sophistication of the cpu/cache/memory subsystem.  This allows the I/O
    board to be implemented without duel port memory and maintains the
    fiction of a main-memory rendezvous without actually incuring the overhead
    of main memory.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:The idea I was trying to communicate is that memory mapped I/O is
:a hell of a lot more friendly to SMP than inb/outb based device
:communication (c.v. my keyboard LED example later in the posting).
:
:Anyway, if the RAMBus stuff is useless for direct memory mapped
:regions on devices (e.g. they can't come in except via slower
:memory elsewhere), then consider me thwacked.  8-).
:
:					Terry Lambert
:					terry@lambert.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message