From owner-svn-src-all@FreeBSD.ORG  Thu Jan  7 15:15:14 2010
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C71D1065672;
	Thu,  7 Jan 2010 15:15:14 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au
	[211.29.132.191])
	by mx1.freebsd.org (Postfix) with ESMTP id AF6888FC08;
	Thu,  7 Jan 2010 15:15:13 +0000 (UTC)
Received: from c122-106-155-90.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-90.carlnfd1.nsw.optusnet.com.au [122.106.155.90])
	by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o07FFAWp021843
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 8 Jan 2010 02:15:11 +1100
Date: Fri, 8 Jan 2010 02:15:10 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Alexander Motin <mav@freebsd.org>
In-Reply-To: <4B450F30.20705@FreeBSD.org>
Message-ID: <20100108013737.S56162@delplex.bde.org>
References: <201001061712.o06HCICF087127@svn.freebsd.org>
	<9bbcef731001060938k2b0014a2m15eef911b9922b2c@mail.gmail.com> 
	<4B44D8FA.2000608@FreeBSD.org>
	<9bbcef731001061103u33fd289q727179454b21ce18@mail.gmail.com>
	<4B450F30.20705@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: svn commit: r201658 - head/sbin/geom/class/stripe
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jan 2010 15:15:14 -0000

On Thu, 7 Jan 2010, Alexander Motin wrote:

> Ivan Voras wrote:
>> Yes, my experience which lead to the post was mostly on UFS which,
>> while AFAIK it does read-ahead, it still does it serially (I think
>> this is implied by your experiments with NCQ and ZFS vs UFS) - so in
>> any case only 2 drives are hit with 64k stripe size at any moment in
>> time.
>
> I do not think it is true. On system with default MAXPHYS I've made
> gstripe with 64K block of 4 equal drives with 108MB/s of maximal read
> speed. Reads with dd from large pre-written file on UFS shown:
>
> vfs.read_max=8 (default) - 235090074 bytes/sec
> vfs.read_max=16          - 378385148 bytes/sec
> vfs.read_max=32          - 386620109 bytes/sec

Maybe I'm wrong about it being limited by MAXPHYS.  'racluster' is
limited by MAXPHYS, but 'maxra' (vfs.read_max) is not, and these
interact confusingly.

BTW, vfs.read_max has bogus units -- fs blocks (bsize not fsize for
ffs IIRC).  The default of 8 works very badly when the fs block size
is small (512 say).  In my version, the units are DEV_BSIZE blocks and
the default is the default MAXPHYS/DEV_BSIZE (should be MAXPHYS/DEV_BSIZE).

> I've put some printfs into the clustering read code and found enough
> read-ahead there. So it works.
>
> One thing IMHO would be nice to see there is the alignment of the
> read-ahead requests to the array stripe size/offset. Dirty hack I've
> tried there, reduced number of requests to the array components by 30%.

ffs thinks that bsize alignment is adequate.  It doesn't try to align
files any more than that.  Then for sequential reads from the beginning
of the file, vfs read clustering tries to read MAXPHYS bytes at a time,
so it perfectly preserves any initial misalignment.  I'm not sure what
happens for large random reads.  Does seeking ouside of the read-ahead
reset the alignment to the seek point?  It shouldn't, if alignment
done by the file system is to work right.  However, vfs should re-align
if the file system or user i/o doesn't, so that all of its reads of
mnt_iosize_max bytes start on an alignment boundary.

Bruce