From owner-freebsd-current Mon Dec 10 18:11:27 2001 Delivered-To: freebsd-current@freebsd.org Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106]) by hub.freebsd.org (Postfix) with ESMTP id E515937B41B; Mon, 10 Dec 2001 18:11:17 -0800 (PST) Received: (from uucp@localhost) by srv1.cosmo-project.de (8.11.0/8.11.0) with UUCP id fBB2BFP62867; Tue, 11 Dec 2001 03:11:15 +0100 (CET) Received: from mail.cicely.de (cicely20.cicely.de [10.1.1.22]) by cicely5.cicely.de (8.12.1/8.12.1) with ESMTP id fBB2BWtx001080; Tue, 11 Dec 2001 03:11:32 +0100 (CET)?g (envelope-from ticso@cicely8.cicely.de) Received: from cicely8.cicely.de (cicely8.cicely.de [10.1.2.10]) by mail.cicely.de (8.11.0/8.11.0) with ESMTP id fBB2BVW05518; Tue, 11 Dec 2001 03:11:32 +0100 (CET) Received: (from ticso@localhost) by cicely8.cicely.de (8.11.6/8.11.6) id fBB2BLx13783; Tue, 11 Dec 2001 03:11:21 +0100 (CET) (envelope-from ticso) Date: Tue, 11 Dec 2001 03:11:21 +0100 From: Bernd Walter To: Greg Lehey Cc: Matthew Dillon , Wilko Bulte , Mike Smith , Terry Lambert , Joerg Wunsch , freebsd-current@FreeBSD.ORG Subject: Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c) Message-ID: <20011211031120.G11774@cicely8.cicely.de> References: <200112101754.fBAHsRV01202@mass.dis.org> <200112101813.fBAIDKo47460@apollo.backplane.com> <20011210192251.A65380@freebie.xs4all.nl> <200112101830.fBAIU4w47648@apollo.backplane.com> <20011211110633.M63585@monorchid.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20011211110633.M63585@monorchid.lemis.com> User-Agent: Mutt/1.3.23i X-Operating-System: FreeBSD cicely8.cicely.de 5.0-CURRENT i386 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: > On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: > > > >>> performance without it - for reading OR writing. It doesn't matter > >>> so much for RAID{1,10}, but it matters a whole lot for something like > >>> RAID-5 where the difference between a spindle-synced read or write > >>> and a non-spindle-synched read or write can be upwards of 35%. > >> > >> If you have RAID5 with I/O sizes that result in full-stripe operations. > > > > Well, 'more then one disk' operations anyway, for random-I/O. Caching > > takes care of sequential I/O reasonably well but random-I/O goes down > > the drain for writes if you aren't spindle synced, no matter what > > the stripe size, > > Can you explain this? I don't see it. In FreeBSD, just about all I/O > goes to buffer cache. After waiting for the drives and not for vinum parity blocks. > > and will go down the drain for reads if you cross a stripe - > > something that is quite common I think. > > I think this is what Mike was referring to when talking about parity > calculation. In any case, going across a stripe boundary is not a > good idea, though of course it can't be avoided. That's one of the > arguments for large stripes. striped: If you have 512byte stripes and have 2 disks. You access 64k which is put into 2 32k transactions onto the disk. The wait time for the complete transaction is the worst of both, which is more than the average of a single disk. With spindle syncronisation the access time for both disks are beleaved to be identic and you get the same as with a single disk. Linear speed could be about twice the speed of a single drive. But this is more theoretic today than real. The average transaction size per disk decreases with growing number of spindles and you get more transaction overhead. Also the voice coil technology used in drives since many years add a random amount of time to the access time, which invalidates some of the spindle sync potential. Plus it may break some benefits of precaching mechanisms in drives. I'm almost shure there is no real performance gain with modern drives. raid5: For a write you have two read transactions and two writes. The two read are at the same position on two different spindless and there the same access time situation exists as in the case above. We don't have the problem with decreased transaction sizes. But we have the same problem with seek time and modern disks as in the case above plus we have the problem that the drives are not exactly equaly loaded which randomizes the access times again. I doubt that we have a performance gain with modern disks in the general case, but there might be some special uses. The last drives I saw which could do spindle sync was the IBM DCHS series. There are easier things to raise performance. Ever wondered why people claim vinums raid5 writes are slow? The answer is astonishing simple: Vinum does striped based locking, while the ufs tries to lay out data mostly ascending sectors. What happens here is that the first write has to wait for two reads and two writes. If we have an ascending write it has to wait for the first write to finish, because the stripe is still locked. The first is unlocked after both physical writes are on disk. Now we start our two reads which are (thanks to drives precache) most likely in the drives cache - than we write. The problem here is that physical writes gets serialized and the drive has to wait a complete rotation between each. If we had a fine grained locking which only locks the accessed sectors in the parity we would be able to have more than a single ascending write transaction onto a single drive. At best the stripe size is bigger than the maximum number of parallel ascending writes the OS does on the volume. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message