From owner-freebsd-geom@FreeBSD.ORG  Mon Nov  6 12:56:28 2006
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: freebsd-geom@FreeBSD.ORG
Delivered-To: freebsd-geom@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 395EA16A4CE;
	Mon,  6 Nov 2006 12:56:28 +0000 (UTC)
	(envelope-from don_oles@able.com.ua)
Received: from able.com.ua (able.com.ua [80.91.162.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CFA7743D4C;
	Mon,  6 Nov 2006 12:56:17 +0000 (GMT)
	(envelope-from don_oles@able.com.ua)
Received: from able.com.ua (localhost [127.0.0.1])
	by able.com.ua (Postfix) with ESMTP id 4C57044C12;
	Mon,  6 Nov 2006 14:56:13 +0200 (EET)
Received: from ohnatkevych.bee.urs.ua (pix-gw1.beeline.ua [193.239.128.60])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by able.com.ua (Postfix) with ESMTP id D0B1A44C11;
	Mon,  6 Nov 2006 14:56:12 +0200 (EET)
Date: Mon, 6 Nov 2006 14:56:12 +0200
From: Oles Hnatkevych <don_oles@able.com.ua>
X-Mailer: The Bat! (v3.71.01) Professional
X-Priority: 3 (Normal)
Message-ID: <1945721449.20061106145612@able.com.ua>
To: Oliver Fromme <olli@lurza.secnetix.de>
In-Reply-To: <200611061204.kA6C4FXt079703@lurza.secnetix.de>
References: <961295086.20061105000919@able.com.ua>
	<200611061204.kA6C4FXt079703@lurza.secnetix.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: ClamAV using ClamSMTP at ABLE
Cc: pjd@FreeBSD.org, freebsd-geom@FreeBSD.ORG
Subject: Re[2]: geom stripe perfomance question
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Oles Hnatkevych <don_oles@able.com.ua>
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Nov 2006 12:56:28 -0000

Hello, Oliver and Pawel

You wrote on 6 11 2006 =E3., 14:04:15:

Oliver, I doubt your words.
"dd" does not read from stripes, it just issues system calls and
functions. It's the task of underlying geom/drivers to read the actual
data. Why do you think "dd" has a bs operand:

     bs=3Dn     Set both input and output block size to n bytes, supersedin=
g the
              ibs and obs operands.  If no conversion values other than
              noerror, notrunc or sync are specified, then each input block=
 is
              copied to the output as a single block without any aggregation
              of short blocks.

And I set bs=3D1m.

More to say - the striping has been designed with the icreased
perfomance in mind. That's why we have a kern.geom.stripe.fast sysctl
variable, that has to reorganize the reads just to avoid the problem
you mention, as I understand (right, Pawel?)

Pawel! You were right about the dd's in parallel.

root# dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count=3D1000 & dd if=3D/dev/a=
d2 of=3D/dev/null bs=3D1m count=3D1000 &
[1] 77476
[2] 77477
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 27.935007 secs (37536271 bytes/sec)
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 28.383332 secs (36943372 bytes/sec)
[1]-  Done                    dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count=
=3D1000
[2]+  Done                    dd if=3D/dev/ad2 of=3D/dev/null bs=3D1m count=
=3D1000

Seems like it's an ATA controller bottleneck.

atapci0@pci0:31:1:      class=3D0x010180 card=3D0x24428086 chip=3D0x244b808=
6 rev=3D0x11 hdr=3D0x00
    vendor   =3D 'Intel Corporation'
    device   =3D '82801BA (ICH2) UltraATA/100 IDE Controller'
    class    =3D mass storage
    subclass =3D ATA

I'll try to do that on another box, not so old, just to find the
truth.
   =20

> Oles Hnatkevych wrote:
 >> I wonder why geom stripe works much worse than the separate disks that
 >> constitute stripe.

> It depends on your workload (or your benchmark).

 >> I have a stripe from two disks. Disks are on separate ATA channels.
 >> [...]
 >> Stripesize: 262144
 >> [...]
 >> Now let's read one of them and stripe.
 >>=20
 >> root# dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count=3D1000
 >> 1048576000 bytes transferred in 14.579483 secs (71921343 bytes/sec)
 >>=20
 >> root# dd if=3D/dev/stripe/bigdata of=3D/dev/null bs=3D1m count=3D1000
 >> 1048576000 bytes transferred in 15.882796 secs (66019610 bytes/sec)
 >>=20
 >> What I would expect is doubling the speed of transfer, not
 >> slowing down. Am I wrong? Or is geom_stripe inefficient?
 >> I tried to do the same with gvinum/stripe - the read
 >> speed was degraded too. And with gmirror depending on slice size speed
 >> was degraded differently.

> I wonder why people always try to use dd for benchmarking.
> It's bogus.  dd is not for benchmarking.  It works in a
> sequential way, i.e. it first reads 256 KB (your stripe
> size) from the first compontent, then 256 KB from the 2nd,
> and so on.  While it reads from one disk, the other one is
> idle.  So it is not surprising that you don't see a speed
> increase (in fact, there's a small decrease because of
> the seek time overhead when switching from on disk to
> the other).  [*]

> The performance of a stripe should be better when you use
> applications that perform parallel I/O access.

> Your benchmark should be as close to your real-world app
> as possible.  If your real-world app is dd (or another one
> that accesses big files sequentially without parallelism),
> then you shouldn't use striping.

> Best regards
>    Oliver

> PS:  [*]  It could be argued that the kernel could prefetch
> the next 256 KB from the other disk, so both disks are kept
> busy for best throughput.  The problem with that is that
> the kernel doesn't know that the next 256 KB will be needed,
> so it doesn't know whether it makes sense to prefetch them
> or not.  dd has no way to tell the kernel about its usage
> pattern (it would require an API similar to madvise(2)).


--=20
Best wishes,
 Oles