From owner-svn-src-all@FreeBSD.ORG Thu Jan 7 15:15:14 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C71D1065672; Thu, 7 Jan 2010 15:15:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id AF6888FC08; Thu, 7 Jan 2010 15:15:13 +0000 (UTC) Received: from c122-106-155-90.carlnfd1.nsw.optusnet.com.au (c122-106-155-90.carlnfd1.nsw.optusnet.com.au [122.106.155.90]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o07FFAWp021843 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 8 Jan 2010 02:15:11 +1100 Date: Fri, 8 Jan 2010 02:15:10 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Alexander Motin In-Reply-To: <4B450F30.20705@FreeBSD.org> Message-ID: <20100108013737.S56162@delplex.bde.org> References: <201001061712.o06HCICF087127@svn.freebsd.org> <9bbcef731001060938k2b0014a2m15eef911b9922b2c@mail.gmail.com> <4B44D8FA.2000608@FreeBSD.org> <9bbcef731001061103u33fd289q727179454b21ce18@mail.gmail.com> <4B450F30.20705@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Ivan Voras Subject: Re: svn commit: r201658 - head/sbin/geom/class/stripe X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jan 2010 15:15:14 -0000 On Thu, 7 Jan 2010, Alexander Motin wrote: > Ivan Voras wrote: >> Yes, my experience which lead to the post was mostly on UFS which, >> while AFAIK it does read-ahead, it still does it serially (I think >> this is implied by your experiments with NCQ and ZFS vs UFS) - so in >> any case only 2 drives are hit with 64k stripe size at any moment in >> time. > > I do not think it is true. On system with default MAXPHYS I've made > gstripe with 64K block of 4 equal drives with 108MB/s of maximal read > speed. Reads with dd from large pre-written file on UFS shown: > > vfs.read_max=8 (default) - 235090074 bytes/sec > vfs.read_max=16 - 378385148 bytes/sec > vfs.read_max=32 - 386620109 bytes/sec Maybe I'm wrong about it being limited by MAXPHYS. 'racluster' is limited by MAXPHYS, but 'maxra' (vfs.read_max) is not, and these interact confusingly. BTW, vfs.read_max has bogus units -- fs blocks (bsize not fsize for ffs IIRC). The default of 8 works very badly when the fs block size is small (512 say). In my version, the units are DEV_BSIZE blocks and the default is the default MAXPHYS/DEV_BSIZE (should be MAXPHYS/DEV_BSIZE). > I've put some printfs into the clustering read code and found enough > read-ahead there. So it works. > > One thing IMHO would be nice to see there is the alignment of the > read-ahead requests to the array stripe size/offset. Dirty hack I've > tried there, reduced number of requests to the array components by 30%. ffs thinks that bsize alignment is adequate. It doesn't try to align files any more than that. Then for sequential reads from the beginning of the file, vfs read clustering tries to read MAXPHYS bytes at a time, so it perfectly preserves any initial misalignment. I'm not sure what happens for large random reads. Does seeking ouside of the read-ahead reset the alignment to the seek point? It shouldn't, if alignment done by the file system is to work right. However, vfs should re-align if the file system or user i/o doesn't, so that all of its reads of mnt_iosize_max bytes start on an alignment boundary. Bruce