From owner-freebsd-current  Thu Mar 23 15:59: 3 2000
Delivered-To: freebsd-current@freebsd.org
Received: from yana.lemis.com (yana.lemis.com [192.109.197.140])
	by hub.freebsd.org (Postfix) with ESMTP id 6069F37C54B
	for <current@FreeBSD.ORG>; Thu, 23 Mar 2000 15:58:57 -0800 (PST)
	(envelope-from grog@mojave.worldwide.lemis.com)
Received: from mojave.worldwide.lemis.com ([216.88.157.130])
	by yana.lemis.com (8.8.8/8.8.8) with ESMTP id KAA02363;
	Fri, 24 Mar 2000 10:28:36 +1030 (CST)
	(envelope-from grog@mojave.worldwide.lemis.com)
Received: (from grog@localhost)
	by mojave.worldwide.lemis.com (8.9.3/8.9.3) id PAA09672;
	Thu, 23 Mar 2000 15:58:13 -0800 (PST)
	(envelope-from grog)
Date: Thu, 23 Mar 2000 15:58:13 -0800
From: Greg Lehey <grog@lemis.com>
To: Dan Nelson <dnelson@emsphone.com>
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	Alfred Perlstein <bright@wintelcom.net>,
	Matthew Dillon <dillon@apollo.backplane.com>, current@FreeBSD.ORG
Subject: Write clustering (was: patches for test / review)
Message-ID: <20000323155812.F9318@mojave.worldwide.lemis.com>
Reply-To: Greg Lehey <grog@lemis.com>
References: <20000320115902.C14789@fw.wintelcom.net> <20211.953581241@critter.freebsd.dk> <20000320152330.A48212@dan.emsphone.com> <20000323152718.C9318@mojave.worldwide.lemis.com> <20000323174438.B59166@dan.emsphone.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <20000323174438.B59166@dan.emsphone.com>; from dnelson@emsphone.com on Thu, Mar 23, 2000 at 05:44:38PM -0600
WWW-Home-Page: http://www.lemis.com/~grog
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-41-739-7062
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thursday, 23 March 2000 at 17:44:38 -0600, Dan Nelson wrote:
> In the last episode (Mar 23), Greg Lehey said:
>>
>> Agreed.  This is on the Vinum wishlist, but it comes at the expense of
>> reliability (how long do you wait to cluster?  What happens if the
>> system fails in between?).  In addition, for Vinum it needs to be done
>> before entering the hardware driver.
>
> For the simplest case, you can choose to optimize only when the user
> sends a single huge write().

We discussed that.  Since the optimum band size is much larger than
MAXPHYS, this can't happen on a correctly configured system.

> That way you don't have to worry about caching dirty pages in vinum.
> This is basically what the hardware RAIDs I have do.

Right, but that seriously degrades normal non-band writes.

> They'll only do the write optimization (they call it "pipelining")
> if you actually send a single SCSI write request large enough to
> span all the disks.  I don't know what would be required to get our
> kernel to even be able to write blocks this big (what's the upper
> limit on MAXPHYS)?

MAXPHYS is currently 128 kB.  I recommend stripes of 256 kB to 512 kB,
so with a 9 disk RAID we're talking about bands of 2 to 4 MB.  My
current idea is to set a flag on each volume specifying that it's
prepared to wait up to n seconds for write clustering.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message