From owner-freebsd-current  Mon Dec 10 18:11:27 2001
Delivered-To: freebsd-current@freebsd.org
Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106])
	by hub.freebsd.org (Postfix) with ESMTP
	id E515937B41B; Mon, 10 Dec 2001 18:11:17 -0800 (PST)
Received: (from uucp@localhost)
	by srv1.cosmo-project.de (8.11.0/8.11.0) with UUCP id fBB2BFP62867;
	Tue, 11 Dec 2001 03:11:15 +0100 (CET)
Received: from mail.cicely.de (cicely20.cicely.de [10.1.1.22])
	by cicely5.cicely.de (8.12.1/8.12.1) with ESMTP id fBB2BWtx001080;
	Tue, 11 Dec 2001 03:11:32 +0100 (CET)?g
	(envelope-from ticso@cicely8.cicely.de)
Received: from cicely8.cicely.de (cicely8.cicely.de [10.1.2.10])
	by mail.cicely.de (8.11.0/8.11.0) with ESMTP id fBB2BVW05518;
	Tue, 11 Dec 2001 03:11:32 +0100 (CET)
Received: (from ticso@localhost)
	by cicely8.cicely.de (8.11.6/8.11.6) id fBB2BLx13783;
	Tue, 11 Dec 2001 03:11:21 +0100 (CET)
	(envelope-from ticso)
Date: Tue, 11 Dec 2001 03:11:21 +0100
From: Bernd Walter <ticso@cicely8.cicely.de>
To: Greg Lehey <grog@FreeBSD.ORG>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	Wilko Bulte <wkb@freebie.xs4all.nl>, Mike Smith <msmith@FreeBSD.ORG>,
	Terry Lambert <tlambert2@mindspring.com>,
	Joerg Wunsch <joerg_wunsch@uriah.heep.sax.de>,
	freebsd-current@FreeBSD.ORG
Subject: Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
Message-ID: <20011211031120.G11774@cicely8.cicely.de>
References: <200112101754.fBAHsRV01202@mass.dis.org> <200112101813.fBAIDKo47460@apollo.backplane.com> <20011210192251.A65380@freebie.xs4all.nl> <200112101830.fBAIU4w47648@apollo.backplane.com> <20011211110633.M63585@monorchid.lemis.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20011211110633.M63585@monorchid.lemis.com>
User-Agent: Mutt/1.3.23i
X-Operating-System: FreeBSD cicely8.cicely.de 5.0-CURRENT i386
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
> On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:
> >
> >>>     performance without it - for reading OR writing.  It doesn't matter
> >>>     so much for RAID{1,10},  but it matters a whole lot for something like
> >>>     RAID-5 where the difference between a spindle-synced read or write
> >>>     and a non-spindle-synched read or write can be upwards of 35%.
> >>
> >> If you have RAID5 with I/O sizes that result in full-stripe operations.
> >
> >     Well, 'more then one disk' operations anyway, for random-I/O.  Caching
> >     takes care of sequential I/O reasonably well but random-I/O goes down
> >     the drain for writes if you aren't spindle synced, no matter what
> >     the stripe size,
> 
> Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
> goes to buffer cache.

After waiting for the drives and not for vinum parity blocks.

> >     and will go down the drain for reads if you cross a stripe -
> >     something that is quite common I think.
> 
> I think this is what Mike was referring to when talking about parity
> calculation.  In any case, going across a stripe boundary is not a
> good idea, though of course it can't be avoided.  That's one of the
> arguments for large stripes.

striped:
If you have 512byte stripes and have 2 disks.
You access 64k which is put into 2 32k transactions onto the disk.
The wait time for the complete transaction is the worst of both,
which is more than the average of a single disk.
With spindle syncronisation the access time for both disks are beleaved
to be identic and you get the same as with a single disk.
Linear speed could be about twice the speed of a single drive.
But this is more theoretic today than real.
The average transaction size per disk decreases with growing number
of spindles and you get more transaction overhead.
Also the voice coil technology used in drives since many years add a
random amount of time to the access time, which invalidates some of
the spindle sync potential.
Plus it may break some benefits of precaching mechanisms in drives.
I'm almost shure there is no real performance gain with modern drives.

raid5:
For a write you have two read transactions and two writes.
The two read are at the same position on two different spindless and
there the same access time situation exists as in the case above.
We don't have the problem with decreased transaction sizes.
But we have the same problem with seek time and modern disks as
in the case above plus we have the problem that the drives are not
exactly equaly loaded which randomizes the access times again.
I doubt that we have a performance gain with modern disks in the
general case, but there might be some special uses.

The last drives I saw which could do spindle sync was the IBM DCHS
series.


There are easier things to raise performance.
Ever wondered why people claim vinums raid5 writes are slow?
The answer is astonishing simple:
Vinum does striped based locking, while the ufs tries to lay out data
mostly ascending sectors.
What happens here is that the first write has to wait for two reads
and two writes.
If we have an ascending write it has to wait for the first write to
finish, because the stripe is still locked.
The first is unlocked after both physical writes are on disk.
Now we start our two reads which are (thanks to drives precache)
most likely in the drives cache - than we write.

The problem here is that physical writes gets serialized and the drive
has to wait a complete rotation between each.
If we had a fine grained locking which only locks the accessed sectors
in the parity we would be able to have more than a single ascending
write transaction onto a single drive.
At best the stripe size is bigger than the maximum number of parallel
ascending writes the OS does on the volume.

-- 
B.Walter              COSMO-Project         http://www.cosmo-project.de
ticso@cicely.de         Usergroup           info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message