From owner-freebsd-smp  Thu Oct  8 11:20:34 1998
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id LAA17926
          for freebsd-smp-outgoing; Thu, 8 Oct 1998 11:20:34 -0700 (PDT)
          (envelope-from owner-freebsd-smp@FreeBSD.ORG)
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA17828
          for <freebsd-smp@FreeBSD.ORG>; Thu, 8 Oct 1998 11:20:14 -0700 (PDT)
          (envelope-from tlambert@usr06.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id LAA06968;
	Thu, 8 Oct 1998 11:20:06 -0700 (MST)
Received: from usr06.primenet.com(206.165.6.206)
 via SMTP by smtp02.primenet.com, id smtpd006888; Thu Oct  8 11:19:57 1998
Received: (from tlambert@localhost)
	by usr06.primenet.com (8.8.5/8.8.5) id LAA04334;
	Thu, 8 Oct 1998 11:19:55 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810081819.LAA04334@usr06.primenet.com>
Subject: Re: hw platform Q - what's a good smp choice these days?
To: dan@math.berkeley.edu (Dan Strick)
Date: Thu, 8 Oct 1998 18:19:55 +0000 (GMT)
Cc: tlambert@primenet.com, dan@math.berkeley.edu, freebsd-smp@FreeBSD.ORG
In-Reply-To: <199810080155.SAA16965@math.berkeley.edu> from "Dan Strick" at Oct 7, 98 06:55:08 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > N times the driver-controller-drive + drive-controller-drive latency.
> >
> > TCP/IP sliding windows work the same way: instead of N latencies
> > for N packets, you get 1 latency amortized across N packets.
> 
> Unless the i/o commands are for contiguous sectors, per command
> latencies are usually much less than head/disk motion latencies,
> so you normally don't get to effectively eliminate per command
> latencies by overlapping them.  I don't believe a drive supporting
> 64 simultaneous tagged commands will execute normal I/O patterns
> anywhere near 64 times as fast as a drive that only executes
> one command at a time.  I am prepared to believe perhaps twice
> the performance on a relatively large I/O load (and not even
> that if the disk driver is "smart" about I/O reordering and
> scatter/gather dma).  Have you got actual real-life performance
> measurements?

64 times faster was 64 times less latency.  I thought I had clarified
that, but apparently not.  Yes, I realize that I didn't choose my
words with sufficient care.  I am striving to do so in this message,
so of course, it's taking me about 8 times longer to compose it.


Yes, I expect a *correctly functioning drive* to reorder tagged
command requests which have been enqueued to it, but not yet
completed.

This was the basis of the discussion of the SCSI write caching
facility: that write caches tend to reorder requests without
notifying the controller, whereas tagged requests are identified
on completion by the associated tag.

As a result, a system that enforces order in software in the
host OS (such as soft updates) can *know* that the information
has been committed to stable storage before it attempts a
transaction that depends on the previous transaction having
been committed to stable storage.

The idea behind write caching is that the drive lies, and says
it has done something that it has not.

This is not necessarily a bad thing, if the drive does not reorder
requests independent of the tags used to enqueue them.  Justin
claims that there exist drives that obey this semantic.


In any case, it should be obvious that commands to a SCSI drive
sent as:

        [        ]
          [        ]
	    [        ]
	      [        ]
		[        ]
		  [        ]

Will complete before commands sent to an IDE drive as:

        [        ][        ][        ][        ][        ][        ]

Because:

        [                  ]

is smaller than:

        [                                                          ]

The only remaining issue is how much overlap does latency through
a vastly slower (than the CPU clock) memory and I/O bus introduces;
I am willing to give you that it may be a smaller overlap, i.e.:

        [        ]
               [        ]

But on a loaded system (i.e., more than one process -- hence the
useless nature of microbenchmarks), *any* overlap is amplified.

You should also note that the latency may, in fact, need to be
propagated all the way to the application layer(*), and with a 10ms
quantum, this effect can be large (if this were not true, then
programs such as "team" and "ddd", as well as the entirety of
the POSIX async I/O subsystem would be rather fruitless pursuits).

(*) because an application may need to make transactional guarantees
to imply state between, for example, a database record file and a
database index file.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message