Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Sep 2010 09:14:57 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        fs@freebsd.org
Subject:   Re: ext2fs now extremely slow
Message-ID:  <20100929084801.M948@besplex.bde.org>
In-Reply-To: <20100929054826.E797@besplex.bde.org>
References:  <20100929031825.L683@besplex.bde.org> <20100929054826.E797@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 29 Sep 2010, Bruce Evans wrote:

> On Wed, 29 Sep 2010, Bruce Evans wrote:
>
>> For benchmarks on ext2fs:
>> 
>> Under FreeBSD-~5.2 rerun today:
>> untar:     59.17 real
>> tar:       19.52 real
>> 
>> Under -current run today:
>> untar:    101.16 real
>> tar:      172.03 real
>> 
>> So, -current is 8.8 times slower for tar, but only 1.7 times slower for
>> untar.
>> ...
>> So it seems that only 1 block in every 8 is used, and there is a seek
>> after every block.  This asks for an 8-fold reduction in throughput,
>> and it seems to have got that and a bit more for reading although not
>> for writing.  Even (or especially) with perfect hardware, it must give
>> an 8-fold reduction.  And it is likely to give more, since it defeats
>> vfs clustering by making all runs of contiguous blocks have length 1.
>> 
>> Simple sequential allocation should be used unless the allocation policy
>> and implementation are very good.
>
> This work a bit better after zapping the 8-fold way:
   Things
> ...
> This gives an improvement of:
>
> untar:    101.16 real -> 63.46
> tar:      172.03 real -> 50.70
>
> Now -current is only 1.1 times slower for untar and 2.6 times slower for
> tar.
>
> There must be a problem with bpref for things to have been so bad.  There
> is some point to leaving a gap of 7 blocks for expansion, but the gap was
> left even between blocks in a single file.
> ...
> I haven't tried the bde_blkpref hack in the above.  It should kill bpref
> completely so that there is no jump between lbn0 and lbn1, and break
> cylinder group based allocation even better.  Setting bde_blkpref to 1
> restores the bug that was present in ext2fs in FreeBSD between 1995 and
> 2010.  This bug gave seqential allocation starting at the beginning of
> the disk in almost all cases, so map searches were slow and early groups
> filled up before later groups were used at all.

Tried this (patch repeated below), and it gave essentially the same
speed as old versions.

The main problem seems to be that the `goal' variables aren't initialized.
After restoring bits verbatim from an old version, things seem to work as
expected:

% Index: ext2_alloc.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v
% retrieving revision 1.2
% diff -u -2 -r1.2 ext2_alloc.c
% --- ext2_alloc.c	1 Sep 2010 05:34:17 -0000	1.2
% +++ ext2_alloc.c	28 Sep 2010 21:08:42 -0000
% @@ -1,2 +1,5 @@
% +int bde_blkpref = 0;
% +int bde_alloc8 = 0;
% +
%  /*-
%   *  modified for Lites 1.1
% @@ -117,4 +120,8 @@
%                                                   ext2_alloccg);
%          if (bno > 0) {
% +		/* set next_alloc fields as done in block_getblk */
% +		ip->i_next_alloc_block = lbn;
% +		ip->i_next_alloc_goal = bno;
% +
%                  ip->i_blocks += btodb(fs->e2fs_bsize);
%                  ip->i_flag |= IN_CHANGE | IN_UPDATE;

The only things that changed recently in this block were the 4 deleted
lines and 4 lines with tabs corrupted to spaces.  Perhaps an editing
error.

% @@ -542,6 +549,12 @@
%  	   then set the goal to what we thought it should be
%  	*/
% +if (bde_blkpref == 0) {
%  	if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0)
%  		return ip->i_next_alloc_goal;
% +} else if (bde_blkpref == 1) {
% +	if(ip->i_next_alloc_block == lbn)
% +		return ip->i_next_alloc_goal;
% +} else
% +	return 0;
% 
%  	/* now check whether we were provided with an array that basically

Not needed now.

% @@ -662,4 +675,5 @@
%  	 * block.
%  	 */
% +if (bde_alloc8 == 0) {
%  	if (bpref)
%  		start = dtogd(fs, bpref) / NBBY;
% @@ -679,4 +693,5 @@
%  		}
%  	}
% +}
% 
%  	bno = ext2_mapsearch(fs, bbp, bpref);

The code to skip to the next 8-block boundary should be removed permanently.
After fixing the initialization, it doesn't generate holes inside files but
it still generates holes between files.  The holes are quite large with
4K-blocks.

Benchmark results with just the initialization of `goal' variables restored:

%%%
ext2fs-1024-1024:
tarcp /f srcs:                 78.79 real         0.31 user         4.94 sys
tar cf /dev/zero srcs:         24.62 real         0.19 user         1.82 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 52.07 real         0.26 user         4.95 sys
tar cf /dev/zero srcs:         24.80 real         0.10 user         1.93 sys
ext2fs-4096-4096:
tarcp /f srcs:                 74.14 real         0.34 user         3.96 sys
tar cf /dev/zero srcs:         33.82 real         0.10 user         1.19 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 53.54 real         0.36 user         3.87 sys
tar cf /dev/zero srcs:         33.91 real         0.14 user         1.15 sys
%%%

The much larger holes between the files are apparently responsible for the
decreased speed with 4K-blocks.  1K-blocks are really too small, so 4K-blocks
should be faster.

Benchmark results with the fix and bde_alloc8 = 1.

ext2fs-1024-1024:
tarcp /f srcs:                 71.60 real         0.15 user         2.04 sys
tar cf /dev/zero srcs:         22.34 real         0.05 user         0.79 sys
ext2fs-1024-1024-as:
tarcp /f srcs:                 46.03 real         0.14 user         2.02 sys
tar cf /dev/zero srcs:         21.97 real         0.05 user         0.80 sys
ext2fs-4096-4096:
tarcp /f srcs:                 59.66 real         0.13 user         1.63 sys
tar cf /dev/zero srcs:         19.88 real         0.07 user         0.46 sys
ext2fs-4096-4096-as:
tarcp /f srcs:                 37.30 real         0.12 user         1.60 sys
tar cf /dev/zero srcs:         19.93 real         0.05 user         0.49 sys

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100929084801.M948>