From owner-freebsd-fs@FreeBSD.ORG Sat May 28 10:37:12 2005 Return-Path: X-Original-To: freebsd-fs@FreeBSD.org Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7098A16A41C; Sat, 28 May 2005 10:37:12 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id E806543D48; Sat, 28 May 2005 10:37:11 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout2.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id j4SAaxkG017884; Sat, 28 May 2005 20:36:59 +1000 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id j4SAasqB006047; Sat, 28 May 2005 20:36:56 +1000 Date: Sat, 28 May 2005 20:36:55 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Dominic Marks In-Reply-To: <200505271328.58072.dom@goodforbusiness.co.uk> Message-ID: <20050528194126.W3563@epsplex.bde.org> References: <200505271328.58072.dom@goodforbusiness.co.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, banhalmi@field.hu Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 May 2005 10:37:12 -0000 On Fri, 27 May 2005, Dominic Marks wrote: > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to > be more related to the msdos filesystem than the USB system so perhaps it > should be reassigned?) It should be. It is even less i386-specific than usb-specific. > I've been evaluating the performance of some usb2 hard discs with FreeBSD and > I found this PR (68719). The submitter is correct that performance with > msdosfs is severely limited. > > I tested a 'LaCie' USB2 disc: > ... > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was > possible. Both test data sets were copied from the systems ATA-100 disc. In > both tests at these peaks gstat reports the device is 100% busy. I use the following to improve transfer rates for msdosfs. The patch is for an old version so it might not apply directly. %%% Index: msdosfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v retrieving revision 1.147 diff -u -2 -r1.147 msdosfs_vnops.c --- msdosfs_vnops.c 4 Feb 2004 21:52:53 -0000 1.147 +++ msdosfs_vnops.c 22 Feb 2004 07:27:15 -0000 @@ -608,4 +622,5 @@ int error = 0; u_long count; + int seqcount; daddr_t bn, lastcn; struct buf *bp; @@ -693,4 +714,5 @@ lastcn = de_clcount(pmp, osize) - 1; + seqcount = ioflag >> IO_SEQSHIFT; do { if (de_cluster(pmp, uio->uio_offset) > lastcn) { @@ -718,5 +740,5 @@ */ bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0); - clrbuf(bp); + vfs_bio_clrbuf(bp); /* * Do the bmap now, since pcbmap needs buffers @@ -767,11 +789,19 @@ * without delay. Otherwise do a delayed write because we * may want to write somemore into the block later. + * XXX comment not updated with code. */ + if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0) + bp->b_flags |= B_CLUSTEROK; if (ioflag & IO_SYNC) - (void) bwrite(bp); - else if (n + croffset == pmp->pm_bpcluster) + (void)bwrite(bp); + else if (vm_page_count_severe() || buf_dirty_count_severe()) bawrite(bp); - else - bdwrite(bp); + else if (n + croffset == pmp->pm_bpcluster) { + if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0) + cluster_write(bp, dep->de_FileSize, seqcount); + else + bawrite(bp); + } else + bdwrite(bp); dep->de_flag |= DE_UPDATE; } while (error == 0 && uio->uio_resid > 0); %%% Notes: - The xxx_count_severe() stuff doesn't work quite right and was observed to work especially badly for msdosfs in some configurations. IIRC, only configurations with a tiny block size (e.g., 512 bytes) showed the problem, and the problem is more likely to be with tiny block sizes actually exercising the "severe" case than with msdosfs or with the tiny block sizes themselves. The behaviour was apparently that when a severe page or buf shortage develops, the above handling makes the problem worse by using bawrite() instead of cluster_write(). Falling back to bawrite() may have made the resource shortage non-fatal, but it made the resource shortage last much longer since bawrite() was much slower, even on the reasonable fast ATA drive that I was testing on. - Using cluster_write() in the above is not essential. bdwrite() works almost as well, or perhaps even better than cluster_write() provided write clustering is enabled by setting B_CLUSTEROK, since when this flag is set the delayed writes are clustered when they are done physically. > I have not made any tests of read performance but from looking at the results > I do not expect that it will be significantly better than write performance. > I may do some when I get more time to investigate and follow up if the > results are unexpected. Try it. I would expect read performance to be much better. If not, don't bother trying the above patch. msdosfs uses read-ahead for read(), and this seems to work well so I haven't even tried changing it to use read clustering (the above only changes it to use write clustering). This may depend on the drive doing read caching and not handling small block sizes too badly. I mostly use ATA drives that have these properties. Writing tinygrams tends to have a relatively higher cost because write caching is not enabled so clustering can only be done by the OS. > Hopefully this will generate some interest in the problem, it is beyond my > time and expertise but it would be very nice to be able to access MS-DOS > formatted filesystems at a reasonable speed! Some other changes are needed for general use at a reasonable speed: - use VMIO for metadata. - don't use pessimal block allocation. The current allocator gives large inter-file fragmentation by attempting to minimise intra-file fragmentation, and when the file system becomes just 1/N full the attempt backfires and gives intra-file fragmentation too (files with more than N clusters are very likely to be fragmented). Bruce