From owner-freebsd-usb@FreeBSD.ORG  Mon May 30 07:40:05 2005
Return-Path: <owner-freebsd-usb@FreeBSD.ORG>
X-Original-To: freebsd-usb@hub.freebsd.org
Delivered-To: freebsd-usb@hub.freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6A97216A41C
	for <freebsd-usb@hub.freebsd.org>; Mon, 30 May 2005 07:40:05 +0000 (GMT)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 324CB43D1F
	for <freebsd-usb@hub.freebsd.org>; Mon, 30 May 2005 07:40:05 +0000 (GMT)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j4U7e4D3006815
	for <freebsd-usb@freefall.freebsd.org>; Mon, 30 May 2005 07:40:04 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j4U7e4Lx006814;
	Mon, 30 May 2005 07:40:04 GMT (envelope-from gnats)
Date: Mon, 30 May 2005 07:40:04 GMT
Message-Id: <200505300740.j4U7e4Lx006814@freefall.freebsd.org>
To: freebsd-usb@FreeBSD.org
From: "Bruce Evans" <bde@zeta.org.au>
Cc: 
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
X-BeenThere: freebsd-usb@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Bruce Evans <bde@zeta.org.au>
List-Id: FreeBSD support for USB <freebsd-usb.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-usb>,
	<mailto:freebsd-usb-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-usb>
List-Post: <mailto:freebsd-usb@freebsd.org>
List-Help: <mailto:freebsd-usb-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-usb>,
	<mailto:freebsd-usb-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 30 May 2005 07:40:05 -0000

The following reply was made to PR i386/68719; it has been noted by GNATS.

From: "Bruce Evans" <bde@zeta.org.au>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 08:30:53 +0100

 On Sun, 29 May 2005, Dominic Marks wrote:
 
 > I have been experimenting in msdosfs_read and I have managed to come up with
 > something that works, but I'm sure it is flawed. On large file reads it will
 > improve read performance (see below) - but only after a long period of the
 > file copy achieving only 3MB/s (see A1). During this time gstat reports the
 > disc itself is reading at its maximum of around 28MB/s. After a long period
 > of low throughput, the disc drops to 25MB/s but the actual transfer rate
 > increases to 25MB/s (see A2).
 
 A1 is strange.  It might be reading too much ahead, but I wouldn't expect 
 the read-ahead to be discarded soon so this should make little difference
 for reading whole files.
 
 > I've tried to narrow it down to something but I'm mostly in the dark, so I'll
 > just hand over what I found to work to review. I looked at Bruce's changes to
 > msdosfs_write and tried to do the same (implement cluster_read) using the
 > ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or
 > too early. I have been unable to interpret the gstat output during the first
 > part of the transfer any further.
 
 The ext2 and ffs methods are a good place to start.  Also look at cd9660 --
 it is a little simpler.
 
 > The patch which combines Bruce's original patch for msdosfs_write, revised for
 > current text positions, and my attempts to do the same for msdosfs_read.
 >
 > %%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.149.2.1
 > diff -u -r1.149.2.1 msdosfs_vnops.c
 > --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 > @@ -565,14 +567,21 @@
 > 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 > 		} else {
 > 			blsize = pmp->pm_bpcluster;
 > -			rablock = lbn + 1;
 > -			if (seqcount > 1 &&
 > -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > -				rasize = pmp->pm_bpcluster;
 > -				error = breadn(vp, lbn, blsize,
 > -				    &rablock, &rasize, 1, NOCRED, &bp);
 > +			/* XXX what is the best value for crsize? */
 > + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > +				error = cluster_read(vp, dep->de_FileSize, lbn,
 > +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
 
 crsize should be just the block size (cluster size in msdosfs and
 blsize variable here) according to this code in all other file systems.
 seqcount gives the amount of readahead and there are algorithms elsewhere
 to guess its best value.  I think cluster_read() reads only physically
 contiguous blocks, so the amount of read-ahead for it is not critical
 for the clustered case anyway.  There will either be a large range of
 contigous blocks, in which case reading ahead a lot isn't bad, or
 read-ahead will be limited by discontiguities.  Giving a too-large
 value for crsize may be harmful by confusing cluster_read() about
 discontiguities, or just by asking it to read the large size when the
 blocks actually in the file aren't contiguous.
 
 I think the above handles most cases, so look for problems there first.
 
 > 			} else {
 
 The above seems to be missing a bread() for the EOF case (before the else).
 I don't know what cluster_read() does at EOF.  See cd9660_read() for clear
 code.  (Here there is unfortunately an extra level of indentation from a
 special case for directories.)
 
 > -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				rablock = lbn + 1;
 > +				if (seqcount > 1 &&
 > +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > +						rasize = pmp->pm_bpcluster;
 > +						error = breadn(vp, lbn, blsize,
 > +						&rablock, &rasize, 1, NOCRED, &bp);
 > +				} else {
 > +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				}
 
 This part seems to be OK.  (It is just the old code indented.)
 
 > 			}
 > 		}
 > 		if (error) {
 > ...
 > %%
 >
 > With this patch I can get the following transfer rates:
 >
 > msdosfs reading
 >
 > # ls -lh /mnt/random2.file
 > -rwxr-xr-x  1 root  wheel   1.0G May 29 11:24 /mnt/random2.file
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       59.61 real         0.05 user         6.79 sys
 >       632  maximum resident set size
 >        11  average shared memory size
 >        80  average unshared data size
 >       123  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >     23757  block input operations **
 >      8192  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >     16660  voluntary context switches
 >     10387  involuntary context switches
 >
 > Average Rate: 15.31MB/s. (Would be higher if not for the slow start)
 >
 > ** This figure is 3x that of the UFS2 operations. This must be a indicator of
 > what I'm doing wrong, but I don't know what.
 
 This might also be a sign of fragmentation due to bad allocation policies
 at write time or write() not being able to do good allocation due to
 previous fragmentation.
 
 The average rate isn't too bad, despite the extra blocks.
 
 > msdosfs writing
 >
 > # /usr/bin/time -al cp /vol/random2.file /mnt
 >       47.33 real         0.03 user         7.13 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        85  average unshared data size
 >       130  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8735  block input operations
 >     16385  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8856  voluntary context switches
 >     29631  involuntary context switches
 >
 > Average Rate: 18.79MB/s.
 
 There are 2x as many blocks as for ffs2 for writing instead of 3x for
 reading.  What are the input blocks for here?  Better put the non-msdosfs
 part of the source or target in memory so that it doesn't get counted.
 Or try mount -v (it gives sync and async read/write counts for individual
 file systems).
 
 2x is actually believable while ffs2's counts aren't.  It corresponds to
 a block size of 64K, which is what I would expect for the unfragmented
 case.
 
 > To compare with UFS2 + softupdates on the same system / disc.
 >
 > ufs2 reading
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       42.39 real         0.02 user         6.61 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        87  average unshared data size
 >       133  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8249  block input operations
 >      8193  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8246  voluntary context switches
 >     24617  involuntary context switches
 >
 > Average Rate: 20.89MB/s.
 
 Isn't it 24.16MB/s?
 
 8192 i/o operations seems to be too small.  It corresponds to a block
 size of 128K.  Most drivers don't actually support doing i/o of that
 size (most have a limit of 64K), so if they get asked to then it is a
 bug.  This bug is common or ubiquitous.  The block size to use for
 clusters is in mnt_iosize_max, and this is set in various wrong ways,
 often or always to MAXPHYS = 128K.  This usually makes little difference
 except to give misleading statistics.  Clustering tends to produce
 blocks of size 128K and the block i/o counts report blocks of that
 sizes, but smaller blocks are sent to the hardware.  I'm not sure if
 libdevstat() sees the smaller blocks.  I think it doesn't.
 
 > [... ufs2 writing similar to reading]
 
 Bruce
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"