From owner-freebsd-fs@FreeBSD.ORG  Sat May 28 10:37:12 2005
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@FreeBSD.org
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7098A16A41C;
	Sat, 28 May 2005 10:37:12 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E806543D48;
	Sat, 28 May 2005 10:37:11 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])
	by mailout2.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j4SAaxkG017884; Sat, 28 May 2005 20:36:59 +1000
Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j4SAasqB006047; Sat, 28 May 2005 20:36:56 +1000
Date: Sat, 28 May 2005 20:36:55 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@epsplex.bde.org
To: Dominic Marks <dom@goodforbusiness.co.uk>
In-Reply-To: <200505271328.58072.dom@goodforbusiness.co.uk>
Message-ID: <20050528194126.W3563@epsplex.bde.org>
References: <200505271328.58072.dom@goodforbusiness.co.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, banhalmi@field.hu
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 28 May 2005 10:37:12 -0000

On Fri, 27 May 2005, Dominic Marks wrote:

> (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to
> be more related to the msdos filesystem than the USB system so perhaps it
> should be reassigned?)

It should be.  It is even less i386-specific than usb-specific.

> I've been evaluating the performance of some usb2 hard discs with FreeBSD and
> I found this PR (68719). The submitter is correct that performance with
> msdosfs is severely limited.
>
> I tested a 'LaCie' USB2 disc:
> ...
> In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
> filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
> possible. Both test data sets were copied from the systems ATA-100 disc. In
> both tests at these peaks gstat reports the device is 100% busy.

I use the following to improve transfer rates for msdosfs.  The patch is
for an old version so it might not apply directly.

%%%
Index: msdosfs_vnops.c
===================================================================
RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
retrieving revision 1.147
diff -u -2 -r1.147 msdosfs_vnops.c
--- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
+++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
@@ -608,4 +622,5 @@
  	int error = 0;
  	u_long count;
+	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
@@ -693,4 +714,5 @@
  		lastcn = de_clcount(pmp, osize) - 1;

+	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
@@ -718,5 +740,5 @@
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
-			clrbuf(bp);
+			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
@@ -767,11 +789,19 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
+		 * XXX comment not updated with code.
  		 */
+		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
+			bp->b_flags |= B_CLUSTEROK;
  		if (ioflag & IO_SYNC)
-			(void) bwrite(bp);
-		else if (n + croffset == pmp->pm_bpcluster)
+			(void)bwrite(bp);
+		else if (vm_page_count_severe() || buf_dirty_count_severe())
  			bawrite(bp);
-		else
-			bdwrite(bp);
+		else if (n + croffset == pmp->pm_bpcluster) {
+			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
+				cluster_write(bp, dep->de_FileSize, seqcount);
+			else
+				bawrite(bp);
+  		} else
+  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
%%%

Notes:
- The xxx_count_severe() stuff doesn't work quite right and was observed
   to work especially badly for msdosfs in some configurations.  IIRC,
   only configurations with a tiny block size (e.g., 512 bytes) showed
   the problem, and the problem is more likely to be with tiny block sizes
   actually exercising the "severe" case than with msdosfs or with the
   tiny block sizes themselves.  The behaviour was apparently that when
   a severe page or buf shortage develops, the above handling makes the
   problem worse by using bawrite() instead of cluster_write().  Falling
   back to bawrite() may have made the resource shortage non-fatal, but
   it made the resource shortage last much longer since bawrite() was much
   slower, even on the reasonable fast ATA drive that I was testing on.
- Using cluster_write() in the above is not essential.  bdwrite() works
   almost as well, or perhaps even better than cluster_write() provided
   write clustering is enabled by setting B_CLUSTEROK, since when this
   flag is set the delayed writes are clustered when they are done
   physically.

> I have not made any tests of read performance but from looking at the results
> I do not expect that it will be significantly better than write performance.
> I may do some when I get more time to investigate and follow up if the
> results are unexpected.

Try it.  I would expect read performance to be much better.  If not, don't
bother trying the above patch.  msdosfs uses read-ahead for read(), and
this seems to work well so I haven't even tried changing it to use read
clustering (the above only changes it to use write clustering).  This may
depend on the drive doing read caching and not handling small block sizes
too badly.  I mostly use ATA drives that have these properties.  Writing
tinygrams tends to have a relatively higher cost because write caching is
not enabled so clustering can only be done by the OS.

> Hopefully this will generate some interest in the problem, it is beyond my
> time and expertise but it would be very nice to be able to access MS-DOS
> formatted filesystems at a reasonable speed!

Some other changes are needed for general use at a reasonable speed:
- use VMIO for metadata.
- don't use pessimal block allocation.  The current allocator gives
   large inter-file fragmentation by attempting to minimise intra-file
   fragmentation, and when the file system becomes just 1/N full the
   attempt backfires and gives intra-file fragmentation too (files with
   more than N clusters are very likely to be fragmented).

Bruce