From owner-freebsd-geom@FreeBSD.ORG Mon Nov 6 12:56:28 2006 Return-Path: X-Original-To: freebsd-geom@FreeBSD.ORG Delivered-To: freebsd-geom@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 395EA16A4CE; Mon, 6 Nov 2006 12:56:28 +0000 (UTC) (envelope-from don_oles@able.com.ua) Received: from able.com.ua (able.com.ua [80.91.162.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id CFA7743D4C; Mon, 6 Nov 2006 12:56:17 +0000 (GMT) (envelope-from don_oles@able.com.ua) Received: from able.com.ua (localhost [127.0.0.1]) by able.com.ua (Postfix) with ESMTP id 4C57044C12; Mon, 6 Nov 2006 14:56:13 +0200 (EET) Received: from ohnatkevych.bee.urs.ua (pix-gw1.beeline.ua [193.239.128.60]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by able.com.ua (Postfix) with ESMTP id D0B1A44C11; Mon, 6 Nov 2006 14:56:12 +0200 (EET) Date: Mon, 6 Nov 2006 14:56:12 +0200 From: Oles Hnatkevych X-Mailer: The Bat! (v3.71.01) Professional X-Priority: 3 (Normal) Message-ID: <1945721449.20061106145612@able.com.ua> To: Oliver Fromme In-Reply-To: <200611061204.kA6C4FXt079703@lurza.secnetix.de> References: <961295086.20061105000919@able.com.ua> <200611061204.kA6C4FXt079703@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV using ClamSMTP at ABLE Cc: pjd@FreeBSD.org, freebsd-geom@FreeBSD.ORG Subject: Re[2]: geom stripe perfomance question X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Oles Hnatkevych List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Nov 2006 12:56:28 -0000 Hello, Oliver and Pawel You wrote on 6 11 2006 =E3., 14:04:15: Oliver, I doubt your words. "dd" does not read from stripes, it just issues system calls and functions. It's the task of underlying geom/drivers to read the actual data. Why do you think "dd" has a bs operand: bs=3Dn Set both input and output block size to n bytes, supersedin= g the ibs and obs operands. If no conversion values other than noerror, notrunc or sync are specified, then each input block= is copied to the output as a single block without any aggregation of short blocks. And I set bs=3D1m. More to say - the striping has been designed with the icreased perfomance in mind. That's why we have a kern.geom.stripe.fast sysctl variable, that has to reorganize the reads just to avoid the problem you mention, as I understand (right, Pawel?) Pawel! You were right about the dd's in parallel. root# dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count=3D1000 & dd if=3D/dev/a= d2 of=3D/dev/null bs=3D1m count=3D1000 & [1] 77476 [2] 77477 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 27.935007 secs (37536271 bytes/sec) 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 28.383332 secs (36943372 bytes/sec) [1]- Done dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count= =3D1000 [2]+ Done dd if=3D/dev/ad2 of=3D/dev/null bs=3D1m count= =3D1000 Seems like it's an ATA controller bottleneck. atapci0@pci0:31:1: class=3D0x010180 card=3D0x24428086 chip=3D0x244b808= 6 rev=3D0x11 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82801BA (ICH2) UltraATA/100 IDE Controller' class =3D mass storage subclass =3D ATA I'll try to do that on another box, not so old, just to find the truth. =20 > Oles Hnatkevych wrote: >> I wonder why geom stripe works much worse than the separate disks that >> constitute stripe. > It depends on your workload (or your benchmark). >> I have a stripe from two disks. Disks are on separate ATA channels. >> [...] >> Stripesize: 262144 >> [...] >> Now let's read one of them and stripe. >>=20 >> root# dd if=3D/dev/ad1 of=3D/dev/null bs=3D1m count=3D1000 >> 1048576000 bytes transferred in 14.579483 secs (71921343 bytes/sec) >>=20 >> root# dd if=3D/dev/stripe/bigdata of=3D/dev/null bs=3D1m count=3D1000 >> 1048576000 bytes transferred in 15.882796 secs (66019610 bytes/sec) >>=20 >> What I would expect is doubling the speed of transfer, not >> slowing down. Am I wrong? Or is geom_stripe inefficient? >> I tried to do the same with gvinum/stripe - the read >> speed was degraded too. And with gmirror depending on slice size speed >> was degraded differently. > I wonder why people always try to use dd for benchmarking. > It's bogus. dd is not for benchmarking. It works in a > sequential way, i.e. it first reads 256 KB (your stripe > size) from the first compontent, then 256 KB from the 2nd, > and so on. While it reads from one disk, the other one is > idle. So it is not surprising that you don't see a speed > increase (in fact, there's a small decrease because of > the seek time overhead when switching from on disk to > the other). [*] > The performance of a stripe should be better when you use > applications that perform parallel I/O access. > Your benchmark should be as close to your real-world app > as possible. If your real-world app is dd (or another one > that accesses big files sequentially without parallelism), > then you shouldn't use striping. > Best regards > Oliver > PS: [*] It could be argued that the kernel could prefetch > the next 256 KB from the other disk, so both disks are kept > busy for best throughput. The problem with that is that > the kernel doesn't know that the next 256 KB will be needed, > so it doesn't know whether it makes sense to prefetch them > or not. dd has no way to tell the kernel about its usage > pattern (it would require an API similar to madvise(2)). --=20 Best wishes, Oles