Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Feb 1999 07:55:33 -0500 (EST)
From:      "Robert G. Brown" <rgb@phy.duke.edu>
To:        Harvey Fishman <fishman@panix.com>
Cc:        aic7xxx Mailing List <AIC7xxx@freebsd.org>, Doug Benjamin <dbenjamin@fnal.gov>
Subject:   Re: Two controllers or a dual...
Message-ID:  <Pine.LNX.3.96.990217071345.15158F-100000@ganesh.phy.duke.edu>
In-Reply-To: <Pine.GSU.4.05.9902161911410.985-100000@panix3.panix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Feb 1999, Harvey Fishman wrote:

> It seems to me that you should be considering just where your bottlenecks
> are going to be.  The classical one is the head-disk bandwidth on the
> drives.  If you have enough discrete requests occurring in parallel to
> different drives, only then does the SCSI channel bandwidth become of
> consequence.  And remember that you also have DMA bandwidth in the
> processor box to worry about.  I suspect that two U2W channels running at
> the same time are going to more than fill that.  The processor will be
> accessing that same memory over the same bus so there is more contention to
> worry about.  If your data is in L2 cache, then you don't need the disks to
> start with.
> 
> So I think that you need a LOT more understanding and detail about the nits
> and grits of the application BEFORE you start thinking about what hardware
> will best serve your needs.

How to disagree?  Of course you are right.  The problem is, as usual,
that the application doesn't exist in situ yet because the prototyping
situ doesn't exist and the application hasn't been ported.  So we're
doing paper engineering.  All I'm trying to do is get better numbers
(and some insight) from knowledgeable persons like yourself before
proceeding, and this strategy appears to be working on the insight but
is a bit short, still, on any useful numbers.  One benchmark measurement
or even an anecdotal report from somebody running two U2W controllers
with fast disk attached is worth a whole lot of theoretical
discussion...

However, regarding the bottlenecks:

I think that if we build virtual partitions that stripe across the
installed disks and use fast disks (e.g. Cheetahs) we can achieve more
than enough streaming bandwidth to saturate a U2W controller.  On paper,
quite a few fast U2W drives can deliver 20 MB/sec or more on streaming
data, and SCSI controllers can handle parallel reads on appropriately
striped streaming data.  The PCI bus itself has a bandwidth of 132
MB/sec (peak, of course).  The memory bus has a bandwidth maybe three
times that (33MHz time 32 bits vs 100 MHz times 32 bits) so I don't
expect DMA to be a bottleneck -- it will be slow relative to the CPU but
much faster than the PCI bottleneck (ignoring latency, but I >>think<<
it is safe to ignore latency for streaming reads and writes).  By the
same token, ONE 80 MB/sec U2W controller won't saturate the PCI bus, but
two would oversubscribe it slightly IF they were both running all the
time.  Then it comes down to just how CPU vs I/O intensive the job
really is and how much disk we want.  If we use 18 GB Cheetahs, we can
get around 70 MB/controller with four disks, and four disks already
probably slightly oversubscribe the controller on streaming parallel
reads or writes.  Two controllers gives us 140 GB, a respectable amount,
but a bit oversubscribed relative to two bottlenecks, as they saturate
the PCI bus as well as each SCSI controller when running full out.  If
we use slower IBM 36 GB drives, we can get 100 MB/controller or even
more and still don't oversubscribe the controller.  Two controllers with
three disks also just misses oversubscribing the PCI bus if we assume
that the drives deliver streaming data at 20 MB/sec peak each.

So this stuff I can work out the paper numbers on, although I'm sure the
real-world numbers are all lower and less optimal.  What I don't know
even on paper is how two controllers work relative to the kernel, SMP,
and interrupts -- if there are two controllers on one interrupt they
presumably share an interrupt and an interrupt lock and the hardware
itself handles some of the interrupt contention problem. If there are
two controllers on two interrupts, they still share the spinlock, but
the interrupts themselves are less correlated (I think.  Comment?).
Which is more efficient?  Is either one likely to shift the bottleneck
so that the kernel itself becomes the bottleneck regardless of the
controller/disk combination used. (That is, does the inefficiency
associated by the asynchronous handling of the possibly poissonian
distribution of interrupt requests on multiple devices slow things down
so that the PCI and memory bus bottlenecks are irrelevant even if I have
4xCheetahs? Do I get gain with 4xCheetahs relative to 3xIBMs?)

I'd love to hear an explanation of why I should or shouldn't worry about
this from somebody who understands the SCSI I/O subsystem and the kernel
and how dual controllers or multiple controllers work (like Doug
Ledford.  Doug?)  Even more, I'd love the measured numbers, since they
make the explanation and above analysis entertaining and useful a
posteriori stuff instead of possibly mistaken but critical a priori
stuff.

Then there is the question of whether a hardware array makes most of
these issues moot.  Can I just order a 200 GB array and treat the whole
thing as a single physical device and get data from it at some
theoretical maximum rate (like 80 MB/sec, or even 133 MB/sec with a
suitable interface)?  Is this likely to be faster or more reliable or...

We may end up having to prototype several ways and make our own
measurements, but some of the prototype configurations aren't cheap even
by themselves.  I'd love to get enough real-world-experience data on any
facets of the above to get in a reasonable ballpark with some prototype
configuration or other before playing with and measuring performance on
the actual application on the prototype.

  Thanks, 

       rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.990217071345.15158F-100000>