From owner-freebsd-performance@FreeBSD.ORG Mon May 2 14:33:21 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A08716A4CE for ; Mon, 2 May 2005 14:33:21 +0000 (GMT) Received: from mail.rfnj.org (ns1.rfnj.org [66.180.172.156]) by mx1.FreeBSD.org (Postfix) with ESMTP id D3A6043D5F for ; Mon, 2 May 2005 14:33:20 +0000 (GMT) (envelope-from asym@rfnj.org) Received: from megalomaniac.rfnj.org (ool-45736df1.dyn.optonline.net [69.115.109.241]) by mail.rfnj.org (Postfix) with ESMTP id 91E5919D; Mon, 2 May 2005 10:33:21 -0400 (EDT) Message-Id: <6.2.1.2.2.20050502103041.037618d0@mail.rfnj.org> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Mon, 02 May 2005 10:39:18 -0400 To: Arne =?iso-8859-1?Q?=22W=F6rner=22?= From: Allen In-Reply-To: <20050502141456.90371.qmail@web41212.mail.yahoo.com> References: <6.2.1.2.2.20050502094757.037077f0@mail.rfnj.org> <20050502141456.90371.qmail@web41212.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: quoted-printable cc: freebsd-performance@freebsd.org Subject: Re: Very low disk performance on 5.x X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2005 14:33:21 -0000 At 10:14 5/2/2005, Arne "W=F6rner" wrote: >--- Allen wrote: > > Also you should keep in mind, there could simply be some really > > goofy > > controller option enabled, that forces the RAID5 to behave in a > > "degraded" > > state for reads -- forcing it to read up all the other disks in > > the stripe > > and calculate the XOR again, to make sure the data it read off > > the disk > > matches the checksum. It's rare, but I've seen it before, and > > it will > > cause exactly this sort of RAID5 performance inversion. Since > > the XOR is > > recalculated on every write and requires only reading up one > > sector on a > > different disk, options that do the above will result in read > > scores > > drastically lower than writes to the same array. > > >Isn't that compensated by the cache? I mean: >We would just >1. read all the blocks, that correspond to the block, that is >requested, >2. put them all into the cache >3. check the parity bits (XOR should be very fast; especially in >comparison to the disc read times) >4. keep them in the cache (some kind of read ahead...) >5. send the requested block to the driver Your steps are appropriate but you should note that #3 is not true on cards= =20 that support RAID5 but do not have hardware-XOR. Some of the very cheap=20 i960 based cards have this failing, so the XOR itself is slow on top of=20 everything else. However, the cache doesn't play a part in this at all. It's the difference= =20 between these two read cycles, assume just one block was written to out of= =20 4, on a 5-disk system. Scenario A, verified read disabled: 1. RAID card reads up one block from appropriate drive. Done. Scenario B, verified read enabled: 1. RAID card reads up ALL blocks in the stripe (5 reads). 2. RAID card pretends the block requested is on a "degraded" drive, and=20 calculates it from the other 3 + the XOR stripe. 3. RAID card reports the value back, or tosses some kind of error. You can see, the cache just doesn't play a part in what I was describing,=20 which is basically the array performing as though it is degraded when in=20 fact it is not, to catch failures that would otherwise be missed.