From owner-freebsd-current Wed Dec 12 18:18: 0 2001 Delivered-To: freebsd-current@freebsd.org Received: from monorchid.lemis.com (monorchid.lemis.com [192.109.197.75]) by hub.freebsd.org (Postfix) with ESMTP id 945A037B417; Wed, 12 Dec 2001 18:17:55 -0800 (PST) Received: by monorchid.lemis.com (Postfix, from userid 1004) id BE352786E3; Thu, 13 Dec 2001 12:47:53 +1030 (CST) Date: Thu, 13 Dec 2001 12:47:53 +1030 From: Greg Lehey To: Bernd Walter Cc: Matthew Dillon , Wilko Bulte , Mike Smith , Terry Lambert , Joerg Wunsch , freebsd-current@FreeBSD.org Subject: Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)) Message-ID: <20011213124753.Q3448@monorchid.lemis.com> References: <200112101754.fBAHsRV01202@mass.dis.org> <200112101813.fBAIDKo47460@apollo.backplane.com> <20011210192251.A65380@freebie.xs4all.nl> <200112101830.fBAIU4w47648@apollo.backplane.com> <20011211110633.M63585@monorchid.lemis.com> <20011211031120.G11774@cicely8.cicely.de> <20011212162205.I82733@monorchid.lemis.com> <20011212125337.D15654@cicely8.cicely.de> <20011213105413.G76019@monorchid.lemis.com> <20011213030613.A18679@cicely8.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20011213030613.A18679@cicely8.cicely.de> User-Agent: Mutt/1.3.23i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thursday, 13 December 2001 at 3:06:14 +0100, Bernd Walter wrote: > On Thu, Dec 13, 2001 at 10:54:13AM +1030, Greg Lehey wrote: >> On Wednesday, 12 December 2001 at 12:53:37 +0100, Bernd Walter wrote: >>> On Wed, Dec 12, 2001 at 04:22:05PM +1030, Greg Lehey wrote: >>>> On Tuesday, 11 December 2001 at 3:11:21 +0100, Bernd Walter wrote: >>>> 2. Cache the parity blocks. This is an optimization which I think >>>> would be very valuable, but which Vinum doesn't currently perform. >>> >>> I thought of connecting the parity to the wait lock. >>> If there's a waiter for the same parity data it's not droped. >>> This way we don't waste memory but still have an efect. >> >> That's a possibility, though it doesn't directly address parity block >> caching. The problem is that by the time you find another lock, >> you've already performed part of the parity calculation, and probably >> part of the I/O transfer. But it's an interesting consideration. > > I know that it doesn't do the best, but it's easy to implement. > A more complex handling for the better results can still be done. I don't have the time to work out an example, but I don't think it would change anything until you had two lock waits. I could be wrong, though: you've certainly brought out something here that I hadn't considered, so if you can write up a detailed example (preferably after you've looked at the code and decided how to handle it), I'd certainly be interested. >>> I would guess it when the stripe size is bigger than the preread >>> cache the drives uses. This would mean we have a less chance to >>> get parity data out of the drive cache. >> >> Yes, this was one of the possibilities we considered. > > It should be measured and compared after I changed the looking. > It will look different after that and may lead to other reasons, > because we will have a different load characteristic on the drives. > Currently if we have two writes in two stripes each, all initated before > the first finished, the drive has to seek between the two stripes, as > the second write to the same stripe has to wait. I'm not sure I understand this. The stripes are on different drives, after all. >>> Whenever a write hits a driver there is a waiter for it. >>> Either a softdep, a memory freeing or an application doing an sync >>> transfer. >>> I'm almost shure delaying writes will harm performance in upper layers. >> >> I'm not so sure. Full stripe writes, where needed, are *much* faster >> than partial strip writes. > > Hardware raid usually comes with NVRAM and can cache write data without > delaying the acklowledge to the initiator. > That option is not available to software raid. It could be. It's probably something worth investigating and supporting. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message