Date: Mon, 30 May 2005 17:19:22 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Dominic Marks <dom@goodforbusiness.co.uk> Cc: freebsd-fs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, banhalmi@field.hu Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem Message-ID: <20050530155609.Q1473@epsplex.bde.org> In-Reply-To: <200505291612.46941.dom@goodforbusiness.co.uk> References: <200505271328.58072.dom@goodforbusiness.co.uk> <200505281213.42118.dom@goodforbusiness.co.uk> <200505281540.35116.dom@goodforbusiness.co.uk> <200505291612.46941.dom@goodforbusiness.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 29 May 2005, Dominic Marks wrote: > I have been experimenting in msdosfs_read and I have managed to come up with > something that works, but I'm sure it is flawed. On large file reads it will > improve read performance (see below) - but only after a long period of the > file copy achieving only 3MB/s (see A1). During this time gstat reports the > disc itself is reading at its maximum of around 28MB/s. After a long period > of low throughput, the disc drops to 25MB/s but the actual transfer rate > increases to 25MB/s (see A2). A1 is strange. It might be reading too much ahead, but I wouldn't expect the read-ahead to be discarded soon so this should make little difference for reading whole files. > I've tried to narrow it down to something but I'm mostly in the dark, so I'll > just hand over what I found to work to review. I looked at Bruce's changes to > msdosfs_write and tried to do the same (implement cluster_read) using the > ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or > too early. I have been unable to interpret the gstat output during the first > part of the transfer any further. The ext2 and ffs methods are a good place to start. Also look at cd9660 -- it is a little simpler. > The patch which combines Bruce's original patch for msdosfs_write, revised for > current text positions, and my attempts to do the same for msdosfs_read. > > %% > Index: msdosfs_vnops.c > =================================================================== > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v > retrieving revision 1.149.2.1 > diff -u -r1.149.2.1 msdosfs_vnops.c > --- msdosfs_vnops.c 31 Jan 2005 23:25:56 -0000 1.149.2.1 > +++ msdosfs_vnops.c 29 May 2005 14:10:18 -0000 > @@ -565,14 +567,21 @@ > error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp); > } else { > blsize = pmp->pm_bpcluster; > - rablock = lbn + 1; > - if (seqcount > 1 && > - de_cn2off(pmp, rablock) < dep->de_FileSize) { > - rasize = pmp->pm_bpcluster; > - error = breadn(vp, lbn, blsize, > - &rablock, &rasize, 1, NOCRED, &bp); > + /* XXX what is the best value for crsize? */ > + crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks; > + if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) { > + error = cluster_read(vp, dep->de_FileSize, lbn, > + crsize, NOCRED, uio->uio_resid, seqcount, &bp); crsize should be just the block size (cluster size in msdosfs and blsize variable here) according to this code in all other file systems. seqcount gives the amount of readahead and there are algorithms elsewhere to guess its best value. I think cluster_read() reads only physically contiguous blocks, so the amount of read-ahead for it is not critical for the clustered case anyway. There will either be a large range of contigous blocks, in which case reading ahead a lot isn't bad, or read-ahead will be limited by discontiguities. Giving a too-large value for crsize may be harmful by confusing cluster_read() about discontiguities, or just by asking it to read the large size when the blocks actually in the file aren't contiguous. I think the above handles most cases, so look for problems there first. > } else { The above seems to be missing a bread() for the EOF case (before the else). I don't know what cluster_read() does at EOF. See cd9660_read() for clear code. (Here there is unfortunately an extra level of indentation from a special case for directories.) > - error = bread(vp, lbn, blsize, NOCRED, &bp); > + rablock = lbn + 1; > + if (seqcount > 1 && > + de_cn2off(pmp, rablock) < dep->de_FileSize) { > + rasize = pmp->pm_bpcluster; > + error = breadn(vp, lbn, blsize, > + &rablock, &rasize, 1, NOCRED, &bp); > + } else { > + error = bread(vp, lbn, blsize, NOCRED, &bp); > + } This part seems to be OK. (It is just the old code indented.) > } > } > if (error) { > ... > %% > > With this patch I can get the following transfer rates: > > msdosfs reading > > # ls -lh /mnt/random2.file > -rwxr-xr-x 1 root wheel 1.0G May 29 11:24 /mnt/random2.file > > # /usr/bin/time -al cp /mnt/random2.file /vol > 59.61 real 0.05 user 6.79 sys > 632 maximum resident set size > 11 average shared memory size > 80 average unshared data size > 123 average unshared stack size > 88 page reclaims > 0 page faults > 0 swaps > 23757 block input operations ** > 8192 block output operations > 0 messages sent > 0 messages received > 0 signals received > 16660 voluntary context switches > 10387 involuntary context switches > > Average Rate: 15.31MB/s. (Would be higher if not for the slow start) > > ** This figure is 3x that of the UFS2 operations. This must be a indicator of > what I'm doing wrong, but I don't know what. This might also be a sign of fragmentation due to bad allocation policies at write time or write() not being able to do good allocation due to previous fragmentation. The average rate isn't too bad, despite the extra blocks. > msdosfs writing > > # /usr/bin/time -al cp /vol/random2.file /mnt > 47.33 real 0.03 user 7.13 sys > 632 maximum resident set size > 12 average shared memory size > 85 average unshared data size > 130 average unshared stack size > 88 page reclaims > 0 page faults > 0 swaps > 8735 block input operations > 16385 block output operations > 0 messages sent > 0 messages received > 0 signals received > 8856 voluntary context switches > 29631 involuntary context switches > > Average Rate: 18.79MB/s. There are 2x as many blocks as for ffs2 for writing instead of 3x for reading. What are the input blocks for here? Better put the non-msdosfs part of the source or target in memory so that it doesn't get counted. Or try mount -v (it gives sync and async read/write counts for individual file systems). 2x is actually believable while ffs2's counts aren't. It corresponds to a block size of 64K, which is what I would expect for the unfragmented case. > To compare with UFS2 + softupdates on the same system / disc. > > ufs2 reading > > # /usr/bin/time -al cp /mnt/random2.file /vol > 42.39 real 0.02 user 6.61 sys > 632 maximum resident set size > 12 average shared memory size > 87 average unshared data size > 133 average unshared stack size > 88 page reclaims > 0 page faults > 0 swaps > 8249 block input operations > 8193 block output operations > 0 messages sent > 0 messages received > 0 signals received > 8246 voluntary context switches > 24617 involuntary context switches > > Average Rate: 20.89MB/s. Isn't it 24.16MB/s? 8192 i/o operations seems to be too small. It corresponds to a block size of 128K. Most drivers don't actually support doing i/o of that size (most have a limit of 64K), so if they get asked to then it is a bug. This bug is common or ubiquitous. The block size to use for clusters is in mnt_iosize_max, and this is set in various wrong ways, often or always to MAXPHYS = 128K. This usually makes little difference except to give misleading statistics. Clustering tends to produce blocks of size 128K and the block i/o counts report blocks of that sizes, but smaller blocks are sent to the hardware. I'm not sure if libdevstat() sees the smaller blocks. I think it doesn't. > [... ufs2 writing similar to reading] Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050530155609.Q1473>