From owner-freebsd-chat Mon Dec 13 16: 1:29 1999 Delivered-To: freebsd-chat@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 700B4150DB for ; Mon, 13 Dec 1999 16:01:25 -0800 (PST) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id RAA18395; Mon, 13 Dec 1999 17:01:13 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp01.primenet.com, id smtpdAAAyTaW2J; Mon Dec 13 17:01:03 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id RAA25603; Mon, 13 Dec 1999 17:01:11 -0700 (MST) From: Terry Lambert Message-Id: <199912140001.RAA25603@usr08.primenet.com> Subject: Re: dual 400 -> dual 600 worth it? To: noslenj@swbell.net (Jay Nelson) Date: Tue, 14 Dec 1999 00:01:11 +0000 (GMT) Cc: tlambert@primenet.com, chat@FreeBSD.ORG In-Reply-To: from "Jay Nelson" at Dec 10, 99 09:26:08 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Sat, 11 Dec 1999, Terry Lambert wrote: > > [snip] > > >Soft updates speak to the ability to stall dependent writes > >until they either _must_ be done, or until they no longer are > >relevent. It does this as a strategy for ordering the metadata > >updates (other methods are DOW - Delayed Ordered Writes - and > >synchronous writing of metadata... in decreasing order of > >performance). > > It's this ability to delay and gather writes that prompted the > question. If a SCSI bus can handle 8-12MB with tagged queuing and > UltraDMA can do 30MB while blocking, where do the performance lines > cross -- or do they? As the number of spindles go up, I would expect > SCSI to outperform IDE -- but on a single drive system, do writes to > an IDE at UDMA speeds block less than gathered writes to a drive on a > nominally slower SCSI bus? The question you have to ask is whether or not your I/O requests are going to be interleaved (and therefore concurrently outstanding) or not. You only need four outstanding concurrent requests for your 8M SCSI bus to beat a 30M UltraDMA. I'll note for the record here that your PCI bus is capable of 133M burst, and considerably less when doing continuous duty, so bus limits will hit before that (I rather suspect your 30M number, if it's real and not a "for instance", is a burst rate, not a continuous transfer rate, and depends either on pre-reading or a disk read cache hit). Effectively, what you are asking is "if I have a really fast network interface, can I set my TCP/IP windows size down to one frame per window, and get better performance than a not quite as fast interface using a large window?". The answer depends on your latency. If you have zero latency, then you don't need the ability to interleave I/O. If you non-zero latency, then interleaving I/O will help you move more data. For tagged command queues, the questions are: o are your seek times so fast that elevator sorting the requests (only possible if the drive knows about two or more requests) will not yield better performance? o is your transfer latency so low that interleaving your I/O will not yield better performance? o is all of your I/O so serialized, because you are only actively running a single application at a time, that you won't benefit from increased I/O concurrency? > [snip] > > >So the very short answer to the question is "on a multi-applicaiton > >server, using soft updates doesn't mean that you wouldn't benefit > >from interleaving your I/O". > > Hmm... that suggests you also might not. On a single drive system > with soft updates, would an Ultra IDE perform worse, on par or better > than SCSI with a light to moderate IO load? Worse, if there is any significance to the I/O, so long as you have non-zero latency. Unfortunately, with drives which lie about their true geometry, and which claim to be "perfect" by automatically redirecting bad sectors, it's not possible to elevator sort in the OS (this used to be common, and variable/unknown drive geometry is why this tuning is currently turned off by default in newfs). > >To speak to Brett's issue of RAID 5, parity is striped across > >all disks, and doing parity updates on one stripe on one disk > >will block all non-interleaved I/O to all other areas of the > >disk; likewise, doing a data write will prevent a parity write > >from completing for the other four disks in the array. > > I have seen reletavely few benefits of RAID 5. Performance sucks > relative to mirroring across separate controllers and until you reach > 10-12 drives, the cost is about the same. I never thought about the > parity, though. Now that I have, I like RAID 5 even less. If your I/O wasn't being serialized by your drives/controller, the parity would be much less of a performance issue. > >This effectively means that, unless you can interleave I/O > >requests, as tagged command queues do, you are much worse off. > > Applying that to the single drive IDE vs. SCSI question suggests that, > even with higher bus burst speeds, I'm still lkely to end up worse > off, depending on load, than I would with SCSI -- soft updates > not withstanding. Is that correct? Yes, depending on load. For a single user desktop connected to a human, generally you only run one application at a time, and so serialized I/O is OK, even if you are doing streaming media to or from the disk. The master/slave bottleneck and multimedia are the reason that modern systems with IDE tend to have two controllers, with one used for the main disk, and the other used for the CDROM/DVD. > >As to the issue of spindle sync, which Brett alluded to, I > >don't think that it is supported for IDE, so you will be > >eating a full rotational latency on average, instead of one > >half a rotational latency, on average: (0 + 1)/2 vs. (1 + 1)/2. > > I think that just answered my earlier question. This is a RAID-specific issue. Most modern drives in a one drive system record sectors in descending order on a track. As soon as the seek completes, it begins caching data, and returns the data you asked for as soon as it has cached back far enough. For single application sequential read behaviour, this pre-caches the data that you are likely to ask for next. A different issue that probably applies to what you meant is for a multiple program machine with two or more readers, the value of the cache is greatly reduced, unless the drive supports multiple track caches, one per reader, in order to preserve locality of reference for each reader. This one is not RAID specific, but is load specific. In general, good SCSI drives will support a track cache per tagged command queue. It would be interesting to test algorithms for increasing I/O locality to keep the number of files on which I/O is outstanding under the number of track caches (probably by delaying I/O for files that were not in the last track cache count minus one I/Os in favor of I/O requests that are). > >Rod Grimes did some experimentation with CCD and spindle sync > >on SCSI devices back when CCD first became capable of mirroring, > >and has some nice hard data that you should ask him for (or dig > >it out of DejaNews on the FreeBSD news group). > > Thanks -- I'll look them up. And -- I appreciate your answer. I > learned quite a bit from it. It did raise the question of differences > between soft updates and lfs -- but I'll save that for another time. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message