From owner-freebsd-hackers Fri Nov 13 01:06:19 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA11542 for freebsd-hackers-outgoing; Fri, 13 Nov 1998 01:06:19 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA11537 for ; Fri, 13 Nov 1998 01:06:18 -0800 (PST) (envelope-from gdonl@tsc.tdk.com) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id BAA25440; Fri, 13 Nov 1998 01:05:29 -0800 (PST) (envelope-from gdonl@tsc.tdk.com) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id BAA10348; Fri, 13 Nov 1998 01:05:28 -0800 (PST) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id BAA07536; Fri, 13 Nov 1998 01:05:26 -0800 (PST) From: Don Lewis Message-Id: <199811130905.BAA07536@salsa.gv.tsc.tdk.com> Date: Fri, 13 Nov 1998 01:05:26 -0800 In-Reply-To: Greg Lehey "Re: [Vinum] Stupid benchmark: newfsstone" (Nov 12, 6:45pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: Greg Lehey , Bernd Walter , Mike Smith , hackers@FreeBSD.ORG Subject: Re: [Vinum] Stupid benchmark: newfsstone Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Nov 12, 6:45pm, Greg Lehey wrote: } Subject: Re: [Vinum] Stupid benchmark: newfsstone } rotational transfer time total } latency } 1 disk/60 kB 4.2 ms 6 ms 10.2 ms } 4 disks/15 kB 7.8 ms 1.5 ms 9.3 ms } } Huh? Why the difference in rotational latency? If you're reading } from one disk, on average you'll have a half track latency. For two, } on average one is half a track off from the other, so you'll have a } latency of .75 a track. With three drives, it's .875, and with four } drives, it's .9375 of a track. Things should not be quite so bleak in practice, since the drives that complete their part of any given transaction the fastest don't have to wait for the slower drives and can get started on the next transaction. Assuming a 9ms seek time, if you have N independent transactions, then the total time to complete them in the small stripe parallel case will be (N-1)*4.2ms + 7.8ms + N*9ms + N*1.5ms or 238.8ms for 16 transactions. If the transactions are small enough so that the transfer time is a small part of the total, then you are better off using larger stripes so that each transaction only involves one spindle, so that you can be servicing four transactions in parallel. ceiling(N/4) * (4.2ms + 9ms + 6ms) or 76.8ms for 16 transactions. While your calculations show that using a 15kB stripe size wins slightly in terms of the latency of one 60kB transaction, it looks to me like this actually loses if you have multiple transactions and add in the seek time. For large transactions, the average transfer rate drops because of the time wasted by head switching and track to track seeks. At some point, make the transfer time will dominate, and if you don't have many transactions that can be done in parallel it makes sense to choose a stripe size so that all the drives are running in parallel on a given transaction. If you're doing 1MB transactions, then it will take more than 100ms just for the transfer time in the single drive case, but about 25ms for the case where you've got four drives processing this transaction in parallel. Since this difference is much greater than the differences in the rotational latency between these two cases, using a stripe size less than one fourth the size of the transaction is the winning strategy. I suspect it would be optimal if the stripes corresponded to disk tracks, but since modern drives have track lengths that vary between cylinders, this is pretty impractical. I suspect that most common system workloads fall into the small transaction category, so the stripe size should be chosen large enough so that it is not common for a transaction to be split across multiple drives. This maximizes the number of transactions that can be processed in parallel. The only problem with using stripe sizes that are very large is that there may be periods of time where a particular part of a filesystem gets heavy use and this would all be directed to one drive while the others sat idle. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message