From owner-freebsd-current Thu Mar 23 15:59: 3 2000 Delivered-To: freebsd-current@freebsd.org Received: from yana.lemis.com (yana.lemis.com [192.109.197.140]) by hub.freebsd.org (Postfix) with ESMTP id 6069F37C54B for ; Thu, 23 Mar 2000 15:58:57 -0800 (PST) (envelope-from grog@mojave.worldwide.lemis.com) Received: from mojave.worldwide.lemis.com ([216.88.157.130]) by yana.lemis.com (8.8.8/8.8.8) with ESMTP id KAA02363; Fri, 24 Mar 2000 10:28:36 +1030 (CST) (envelope-from grog@mojave.worldwide.lemis.com) Received: (from grog@localhost) by mojave.worldwide.lemis.com (8.9.3/8.9.3) id PAA09672; Thu, 23 Mar 2000 15:58:13 -0800 (PST) (envelope-from grog) Date: Thu, 23 Mar 2000 15:58:13 -0800 From: Greg Lehey To: Dan Nelson Cc: Poul-Henning Kamp , Alfred Perlstein , Matthew Dillon , current@FreeBSD.ORG Subject: Write clustering (was: patches for test / review) Message-ID: <20000323155812.F9318@mojave.worldwide.lemis.com> Reply-To: Greg Lehey References: <20000320115902.C14789@fw.wintelcom.net> <20211.953581241@critter.freebsd.dk> <20000320152330.A48212@dan.emsphone.com> <20000323152718.C9318@mojave.worldwide.lemis.com> <20000323174438.B59166@dan.emsphone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20000323174438.B59166@dan.emsphone.com>; from dnelson@emsphone.com on Thu, Mar 23, 2000 at 05:44:38PM -0600 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thursday, 23 March 2000 at 17:44:38 -0600, Dan Nelson wrote: > In the last episode (Mar 23), Greg Lehey said: >> >> Agreed. This is on the Vinum wishlist, but it comes at the expense of >> reliability (how long do you wait to cluster? What happens if the >> system fails in between?). In addition, for Vinum it needs to be done >> before entering the hardware driver. > > For the simplest case, you can choose to optimize only when the user > sends a single huge write(). We discussed that. Since the optimum band size is much larger than MAXPHYS, this can't happen on a correctly configured system. > That way you don't have to worry about caching dirty pages in vinum. > This is basically what the hardware RAIDs I have do. Right, but that seriously degrades normal non-band writes. > They'll only do the write optimization (they call it "pipelining") > if you actually send a single SCSI write request large enough to > span all the disks. I don't know what would be required to get our > kernel to even be able to write blocks this big (what's the upper > limit on MAXPHYS)? MAXPHYS is currently 128 kB. I recommend stripes of 256 kB to 512 kB, so with a 9 disk RAID we're talking about bands of 2 to 4 MB. My current idea is to set a flag on each volume specifying that it's prepared to wait up to n seconds for write clustering. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message