Date: Wed, 17 Feb 1999 07:55:33 -0500 (EST) From: "Robert G. Brown" <rgb@phy.duke.edu> To: Harvey Fishman <fishman@panix.com> Cc: aic7xxx Mailing List <AIC7xxx@freebsd.org>, Doug Benjamin <dbenjamin@fnal.gov> Subject: Re: Two controllers or a dual... Message-ID: <Pine.LNX.3.96.990217071345.15158F-100000@ganesh.phy.duke.edu> In-Reply-To: <Pine.GSU.4.05.9902161911410.985-100000@panix3.panix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Feb 1999, Harvey Fishman wrote: > It seems to me that you should be considering just where your bottlenecks > are going to be. The classical one is the head-disk bandwidth on the > drives. If you have enough discrete requests occurring in parallel to > different drives, only then does the SCSI channel bandwidth become of > consequence. And remember that you also have DMA bandwidth in the > processor box to worry about. I suspect that two U2W channels running at > the same time are going to more than fill that. The processor will be > accessing that same memory over the same bus so there is more contention to > worry about. If your data is in L2 cache, then you don't need the disks to > start with. > > So I think that you need a LOT more understanding and detail about the nits > and grits of the application BEFORE you start thinking about what hardware > will best serve your needs. How to disagree? Of course you are right. The problem is, as usual, that the application doesn't exist in situ yet because the prototyping situ doesn't exist and the application hasn't been ported. So we're doing paper engineering. All I'm trying to do is get better numbers (and some insight) from knowledgeable persons like yourself before proceeding, and this strategy appears to be working on the insight but is a bit short, still, on any useful numbers. One benchmark measurement or even an anecdotal report from somebody running two U2W controllers with fast disk attached is worth a whole lot of theoretical discussion... However, regarding the bottlenecks: I think that if we build virtual partitions that stripe across the installed disks and use fast disks (e.g. Cheetahs) we can achieve more than enough streaming bandwidth to saturate a U2W controller. On paper, quite a few fast U2W drives can deliver 20 MB/sec or more on streaming data, and SCSI controllers can handle parallel reads on appropriately striped streaming data. The PCI bus itself has a bandwidth of 132 MB/sec (peak, of course). The memory bus has a bandwidth maybe three times that (33MHz time 32 bits vs 100 MHz times 32 bits) so I don't expect DMA to be a bottleneck -- it will be slow relative to the CPU but much faster than the PCI bottleneck (ignoring latency, but I >>think<< it is safe to ignore latency for streaming reads and writes). By the same token, ONE 80 MB/sec U2W controller won't saturate the PCI bus, but two would oversubscribe it slightly IF they were both running all the time. Then it comes down to just how CPU vs I/O intensive the job really is and how much disk we want. If we use 18 GB Cheetahs, we can get around 70 MB/controller with four disks, and four disks already probably slightly oversubscribe the controller on streaming parallel reads or writes. Two controllers gives us 140 GB, a respectable amount, but a bit oversubscribed relative to two bottlenecks, as they saturate the PCI bus as well as each SCSI controller when running full out. If we use slower IBM 36 GB drives, we can get 100 MB/controller or even more and still don't oversubscribe the controller. Two controllers with three disks also just misses oversubscribing the PCI bus if we assume that the drives deliver streaming data at 20 MB/sec peak each. So this stuff I can work out the paper numbers on, although I'm sure the real-world numbers are all lower and less optimal. What I don't know even on paper is how two controllers work relative to the kernel, SMP, and interrupts -- if there are two controllers on one interrupt they presumably share an interrupt and an interrupt lock and the hardware itself handles some of the interrupt contention problem. If there are two controllers on two interrupts, they still share the spinlock, but the interrupts themselves are less correlated (I think. Comment?). Which is more efficient? Is either one likely to shift the bottleneck so that the kernel itself becomes the bottleneck regardless of the controller/disk combination used. (That is, does the inefficiency associated by the asynchronous handling of the possibly poissonian distribution of interrupt requests on multiple devices slow things down so that the PCI and memory bus bottlenecks are irrelevant even if I have 4xCheetahs? Do I get gain with 4xCheetahs relative to 3xIBMs?) I'd love to hear an explanation of why I should or shouldn't worry about this from somebody who understands the SCSI I/O subsystem and the kernel and how dual controllers or multiple controllers work (like Doug Ledford. Doug?) Even more, I'd love the measured numbers, since they make the explanation and above analysis entertaining and useful a posteriori stuff instead of possibly mistaken but critical a priori stuff. Then there is the question of whether a hardware array makes most of these issues moot. Can I just order a 200 GB array and treat the whole thing as a single physical device and get data from it at some theoretical maximum rate (like 80 MB/sec, or even 133 MB/sec with a suitable interface)? Is this likely to be faster or more reliable or... We may end up having to prototype several ways and make our own measurements, but some of the prototype configurations aren't cheap even by themselves. I'd love to get enough real-world-experience data on any facets of the above to get in a reasonable ballpark with some prototype configuration or other before playing with and measuring performance on the actual application on the prototype. Thanks, rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe aic7xxx" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.990217071345.15158F-100000>