From owner-freebsd-smp Thu Oct 8 11:20:34 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA17926 for freebsd-smp-outgoing; Thu, 8 Oct 1998 11:20:34 -0700 (PDT) (envelope-from owner-freebsd-smp@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA17828 for ; Thu, 8 Oct 1998 11:20:14 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id LAA06968; Thu, 8 Oct 1998 11:20:06 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp02.primenet.com, id smtpd006888; Thu Oct 8 11:19:57 1998 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id LAA04334; Thu, 8 Oct 1998 11:19:55 -0700 (MST) From: Terry Lambert Message-Id: <199810081819.LAA04334@usr06.primenet.com> Subject: Re: hw platform Q - what's a good smp choice these days? To: dan@math.berkeley.edu (Dan Strick) Date: Thu, 8 Oct 1998 18:19:55 +0000 (GMT) Cc: tlambert@primenet.com, dan@math.berkeley.edu, freebsd-smp@FreeBSD.ORG In-Reply-To: <199810080155.SAA16965@math.berkeley.edu> from "Dan Strick" at Oct 7, 98 06:55:08 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > N times the driver-controller-drive + drive-controller-drive latency. > > > > TCP/IP sliding windows work the same way: instead of N latencies > > for N packets, you get 1 latency amortized across N packets. > > Unless the i/o commands are for contiguous sectors, per command > latencies are usually much less than head/disk motion latencies, > so you normally don't get to effectively eliminate per command > latencies by overlapping them. I don't believe a drive supporting > 64 simultaneous tagged commands will execute normal I/O patterns > anywhere near 64 times as fast as a drive that only executes > one command at a time. I am prepared to believe perhaps twice > the performance on a relatively large I/O load (and not even > that if the disk driver is "smart" about I/O reordering and > scatter/gather dma). Have you got actual real-life performance > measurements? 64 times faster was 64 times less latency. I thought I had clarified that, but apparently not. Yes, I realize that I didn't choose my words with sufficient care. I am striving to do so in this message, so of course, it's taking me about 8 times longer to compose it. Yes, I expect a *correctly functioning drive* to reorder tagged command requests which have been enqueued to it, but not yet completed. This was the basis of the discussion of the SCSI write caching facility: that write caches tend to reorder requests without notifying the controller, whereas tagged requests are identified on completion by the associated tag. As a result, a system that enforces order in software in the host OS (such as soft updates) can *know* that the information has been committed to stable storage before it attempts a transaction that depends on the previous transaction having been committed to stable storage. The idea behind write caching is that the drive lies, and says it has done something that it has not. This is not necessarily a bad thing, if the drive does not reorder requests independent of the tags used to enqueue them. Justin claims that there exist drives that obey this semantic. In any case, it should be obvious that commands to a SCSI drive sent as: [ ] [ ] [ ] [ ] [ ] [ ] Will complete before commands sent to an IDE drive as: [ ][ ][ ][ ][ ][ ] Because: [ ] is smaller than: [ ] The only remaining issue is how much overlap does latency through a vastly slower (than the CPU clock) memory and I/O bus introduces; I am willing to give you that it may be a smaller overlap, i.e.: [ ] [ ] But on a loaded system (i.e., more than one process -- hence the useless nature of microbenchmarks), *any* overlap is amplified. You should also note that the latency may, in fact, need to be propagated all the way to the application layer(*), and with a 10ms quantum, this effect can be large (if this were not true, then programs such as "team" and "ddd", as well as the entirety of the POSIX async I/O subsystem would be rather fruitless pursuits). (*) because an application may need to make transactional guarantees to imply state between, for example, a database record file and a database index file. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message